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Foreword 



I am delighted to write this foreword. This book, a reference where one 
can look up the details of most any algorithm to find a clear miambiguous 
description, has long been needed and here it finally is. A concise reference 
that has taken many hours to write but which has the capacity to save vast 
amounts of time previously spent digging out original papers. 

I have known the author for several years and have had experience of his 
amazing capacity for work and the sheer quality of his output, so this book 
comes as no surprise to me. But I hope it will be a surprise and delight to 
you, the reader for whom it has been written. 

But useful as this book is, it is only a beginning. There are so many 
algorithms that no one author could hope to cover them all. So if you know 
of an algorithm that is not yet here, how about contributing it using the 
same clear and lucid style? 



Professor Tim Hendtlass 
Complex Intelligent Systems Laboratory 
Faculty of Information and Communication Technologies 

Swinburne University of Technology 



Melbourne, Australia 
2010 
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Preface 



About the book 

The need for this project was born of frustration while working towards my 
PhD. I was investigating optimization algorithms and was implementing 
a large number of them for a software platform called the Optimization 
Algorithm Toolkit (OAT)^. Each algorithm required considerable effort 
to locate the relevant source material (from books, papers, articles, and 
existing implementations), decipher and interpret the technique, and finally 
attempt to piece together a working implementation. 

Taking a broader perspective, I realized that the communication of 
algorithmic techniques in the field of Artificial Intelligence was clearly a 
difficult and outstanding open problem. Generally, algorithm descriptions 
are: 

• Incomplete: many techniques are ambiguously described, partially 
described, or not described at all. 

• Inconsistent: a given technique may be described using a variety of 
formal and semi- formal methods that vary across different techniques, 
limiting the transferability of background skills an audience requires 
to read a technique (such as mathematics, pseudocode, program code, 
and narratives). An inconsistent representation for techniques means 
that the skills used to understand and internalize one technique may 
not be transferable to realizing different techniques or even extensions 
of the same technique. 

• Distributed: the description of data structures, operations, and pa- 
rameterization of a given technique may span a collection of papers, 
articles, books, and source code published over a number of years, the 
access to which may be restricted and difficult to obtain. 

For the practitioner, a badly described algorithm may be simply frus- 
trating, where the gaps in available information are filled with intuition and 

^OAT located at http://optalgtoolkit.sourceforge.net 
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'best guess'. At the other end of the spectrum, a badly described algorithm 
may be an example of bad science and the failure of the scientific method, 
where the inability to understand and implement a technique may prevent 
the replication of results, the application, or the investigation and extension 

of a technique. 

The software I produced provided a first step solution to this problem: a 
set of working algorithms implemented in a (somewhat) consistent way and 
downloaded from a single location (features likely provided by any library of 
artificial intelligence techniques). The next logical step needed to address this 
problem is to develop a methodology that anybody can follow. The strategy 
to address the open problem of poor algorithm communication is to present 
complete algorithm descriptions (rather than just implementations) in a 
consistent manner, and in a centralized location. This book is the outcome 
of developing such a strategy that not only provides a methodology for 
standardized algorithm descriptions, but provides a large corpus of complete 
and consistent algorithm descriptions in a single centralized location. 

The algorithms described in this work are practical, interesting, and 
fun, and the goal of this project was to promote these features by making 
algorithms from the field more accessible, usable, and understandable. 
This project was developed over a number years through a lot of writing, 
discussion, and revision. This book has been released under a permissive 
license that encourages the reader to explore new and creative ways of 
further communicating its message and content. 

I hope that this project has succeeded in some small way and that you 
too can enjoy applying, learning, and playing with Clever Algorithms. 



Jason Brownlee 



Melbourne, Australia 
2011 
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Chapter 1 

Introduction 



Welcome to Clever Algorithms! This is a handbook of recipes for com- 
putational problem solving techniques from the fields of Computational 
Intelligence, Biologically Inspired Computation, and Metaheuristics. Clever 
Algorithms are interesting, practical, and fun to learn about and implement. 
Research scientists may be interested in browsing algorithm inspirations in 
search of an interesting system or process analogs to investigate. Developers 
and software engineers may compare various problem solving algorithms 
and technique-specific guidelines. Practitioners, students, and interested 
amateurs may implement state-of-the-art algorithms to address business or 
scientific needs, or simply play with the fascinating systems they represent. 

This introductory chapter provides relevant background information on 
Artificial Intelligence and Algorithms. The core of the book provides a large 
corpus of algorithms presented in a complete and consistent manner. The 
final chapter covers some advanced topics to consider once a number of 
algorithms have been mastered. This book has been designed as a reference 
text, where specific techniques are looked up, or where the algorithms across 
whole fields of study can be browsed, rather than being read cover-to-cover. 
This book is an algorithm handbook and a technique guidebook, and I hope 
you find something useful. 

1.1 What is AI 

1.1.1 Artificial Intelligence 

The field of classical Artificial Intelligence (AI) coalesced in the 1950s 
drawing on an understanding of the brain from neuroscience, the new 
mathematics of information theory, control theory referred to as cybernetics, 
and the dawn of the digital computer. AI is a cross-disciplinary field 
of research that is generally concerned with developing and investigating 
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systems that operate or act intelligently. It is considered a discipline in the 
field of computer science given the strong focus on computation. 

Russell and Norvig provide a perspective that defines Artificial Intel- 
ligence in four categories: 1) systems that think like humans, 2) systems 
that act like humans, 3) systems that think rationally, 4) systems that 
act rationally [43]. In their definition, acting like a human suggests that 
a system can do some specific things humans can do, this includes fields 
such as the Turing test, natural language processing, automated reasoning, 
knowledge representation, machine learning, computer vision, and robotics. 
Thinking like a human suggests systems that model the cognitive informa- 
tion processing properties of humans, for example a general problem solver 
and systems that build internal models of their world. Thinking rationally 
suggests laws of rationalism and structured thought, such as syllogisms and 
formal logic. Finally, acting rationally suggests systems that do rational 
things such as expected utility maximization and rational agents. 

Luger and Stubblefield suggest that AI is a sub-field of computer science 
concerned with the automation of intelligence, and like other sub-fields 
of computer science has both theoretical concerns (how and why do the 
systems work?) and application concerns {where and when can the systems 
be used?) [34]. They suggest a strong empirical focus to research, because 
although there may be a strong desire for mathematical analysis, the systems 
themselves defy analysis given their complexity. The machines and software 
investigated in AI are not black boxes, rather analysis proceeds by observing 
the systems interactions with their environments, followed by an internal 
assessment of the system to relate its structure back to its behavior. 

Artificial Intelligence is therefore concerned with investigating mecha- 
nisms that underlie intelligence and intelligence behavior. The traditional 
approach toward designing and investigating AI (the so-called 'good old 
fashioned' AI) has been to employ a symbolic basis for these mechanisms. 
A newer approach historically referred to as scruffy artificial intelligence or 
soft computing does not necessarily use a symbolic basis, instead patterning 
these mechanisms after biological or natural processes. This represents a 
modern paradigm shift in interest from symbolic knowledge representations, 
to inference strategies for adaptation and learning, and has been referred to 
as neat versus scruffy approaches to AI. The neat philosophy is concerned 
with formal symbolic models of intelligence that can explain why they work, 
whereas the scruffy philosophy is concerned with intelligent strategies that 
explain how they work [44]. 

Neat AI 

The traditional stream of AI concerns a top down perspective of problem 
solving, generally involving symbolic representations and logic processes 
that most importantly can explain why the systems work. The successes of 
this prescriptive stream include a multitude of specialist approaches such 
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as rule-based expert systems, automatic theorem provers, and operations 
research techniques that underly modern planning and scheduling software. 
Although traditional approaches have resulted in significant success they 
have their limits, most notably scalability, hicreases in problem size result in 
an unmanageable increase in the complexity of such problems meaning that 
although traditional techniques can guarantee an optimal, precise, or true 
solution, the computational execution time or computing memory required 
can be intractable. 



Scruffy AI 

There have been a number of thrusts in the field of AI toward less crisp 
techniques that are able to locate approximate, imprecise, or partially-true 
solutions to problems with a reasonable cost of resources. Such approaches 
are typically descriptive rather than prescriptive, describing a process for 
achieving a solution (how), but not explaining why they work (like the 
neater approaches). 

Scruffy AI approaches are defined as relatively simple procedures that 
result in complex emergent and self-organizing behavior that can defy 
traditional reductionist analyses, the effects of which can be exploited for 
quickly locating approximate solutions to intractable problems. A common 
characteristic of such techniques is the incorporation of randomness in 
their processes resulting in robust probabilistic and stochastic decision 
making contrasted to the sometimes more fragile determinism of the crisp 
approaches. Another important common attribute is the adoption of an 
inductive rather than deductive approach to problem solving, generalizing 
solutions or decisions from sets of specific observations made by the system. 



1.1.2 Natural Computation 

An important perspective on scruffy Artificial Intelligence is the motivation 
and inspiration for the core information processing strategy of a given 
technique. Computers can only do what they are instructed, therefore a 
consideration is to distill information processing from other fields of study, 
such as the physical world and biology. The study of biologically motivated 
computation is called Biologically Inspired Computing [16], and is one of 
three related fields of Natural Computing [22, 23, 39]. Natural Computing 
is an interdisciplinary field concerned with the relationship of computation 
and biology, which in addition to Biologically Inspired Computing is also 
comprised of Computationally Motivated Biology and Computing with 
Biology [36, 40]. 
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Biologically Inspired Computation 

Biologically Inspired Computation is computation inspired by biological 
metaphor, also referred to as Biomimicry, and Biomemetics in other engi- 
neering disciplines [6, 17]. The intent of this field is to devise mathematical 
and engineering tools to generate solutions to computation problems. The 
field involves using procedures for finding solutions abstracted from the 
natural world for addressing computationally phrased problems. 

Computationally Motivated Biology 

Computationally Motivated Biology involves investigating biology using 
computers. The intent of this area is to use information sciences and 
simulation to model biological systems in digital computers with the aim 
to replicate and better understand behaviors in biological systems. The 
field facilitates the ability to better understand life-as-it-is and investigate 
life-as-it-could-be. Typically, work in this sub-field is not concerned with 
the construction of mathematical and engineering tools, rather it is focused 
on simulating natural phenomena. Common examples include Artificial 
Life, Fractal Geometry (L-systems, Iterative Function Systems, Particle 
Systems, Brownian motion), and Cellular Automata. A related field is that 
of Computational Biology generally concerned with modeling biological 
systems and the application of statistical methods such as in the sub-field 
of Bioinformatics. 



Computation with Biology 

Computation with Biology is the investigation of substrates other than 
silicon in which to implement computation [1]. Common examples include 
molecular or DNA Computing and Quantum Computing. 

1.1.3 Computational Intelligence 

Computational Intelligence is a modern name for the sub-field of AI con- 
cerned with sub-symbolic (also called messy, scrufi^y, and soft) techniques. 
Computational Intelligence describes techniques that focus on strategy and 
outcome. The field broadly covers sub-disciplines that focus on adaptive 
and intelligence systems, not limited to: Evolutionary Computation, Swarm 
Intelligence (Particle Swarm and Ant Colony Optimization), Fuzzy Systems, 
Artificial Immune Systems, and Artificial Neural Networks [20, 41]. This 
section provides a brief summary of the each of the five primary areas of 
study. 
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Evolutionary Computation 

A paradigm that is concerned with the investigation of systems inspired by 
the neo-Darwinian theory of evolution by means of natural selection (natural 
selection theory and an understanding of genetics). Popular evolutionary 
algorithms include the Genetic Algorithm, Evolution Strategy, Genetic 
and Evolutionary Programming, and Differential Evolution [4, 5]. The 
evolutionary process is considered an adaptive strategy and is typically 
applied to search and optimization domains [26, 28]. 

Swarm Intelligence 

A paradigm that considers collective intelligence as a behavior that emerges 
through the interaction and cooperation of large numbers of lesser intelligent 
agents. The paradigm consists of two dominant sub-fields 1) Ant Colony 
Optimization that investigates probabilistic algorithms inspired by the 
foraging behavior of ants [10, 18], and 2) Particle Swarm Optimization that 
investigates probabilistic algorithms inspired by the flocking and foraging 
behavior of birds and fish [30]. Like evolutionary computation, swarm 
intelligence-based techniques are considered adaptive strategies and are 
typically applied to search and optimization domains. 

Artificial Neural Networks 

Neural Networks are a paradigm that is concerned with the investigation of 
architectures and learning strategies inspired by the modeling of neurons 
in the brain [8]. Learning strategies are typically divided into supervised 
and unsupervised which manage environmental feedback in different ways. 
Neural network learning" processes are considered adaptive learning and 
are typically applied to function approximation and pattern recognition 
domains. 

Fuzzy Intelligence 

Fuzzy Intelligence is a paradigm that is concerned with the investigation of 
fuzzy logic, which is a form of logic that is not constrained to true and false 
determinations like propositional logic, but rather functions which define 
approximate truth, or degrees of truth [52]. Fuzzy logic and fuzzy systems 
are a logic system used as a reasoning strategy and are typically applied to 
expert system and control system domains. 

Artificial Immune Systems 

A collection of approaches inspired by the structure and function of the 
acquired immune system of vertebrates. Popular approaches include clonal 
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selection, negative selection, the dendritic cell algorithm, and immune net- 
work algorithms. The immune-inspired adaptive processes vary in strategy 
and show similarities to the fields of Evolutionary Computation and Artifi- 
cial Neural Networks, and are typically used for optimization and pattern 
recognition domains [15]. 

1.1.4 Metaheuristics 

Another popular name for the strategy-outcome perspective of scruffy AI is 
metaheuristics. In this context, heuristic is an algorithm that locates 'good 
enough' solutions to a problem without concern for whether the solution 
can be proven to be correct or optimal [37]. Heuristic methods trade-off 
concerns such as precision, quality, and accuracy in favor of computational 
effort (space and time efficiency). The greedy search procedure that only 
takes cost-improving steps is an example of heuristic method. 

Like heuristics, metaheuristics may be considered a general algorithmic 
framework that can be applied to different optimization problems with 
relative few modifications to adapt them to a specific problem [25, 46]. The 
difference is that metaheuristics are intended to extend the capabilities 
of heuristics by combining one or more heuristic methods (referred to as 
procedures) using a higher-level strategy (hence 'meta'). A procedure in a 
metaheuristic is considered black-box in that little (if any) prior knowledge 
is known about it by the metaheuristic, and as such it may be replaced with 
a different procedure. Procedures may be as simple as the manipulation of 
a representation, or as complex as another complete metaheuristic. Some 
examples of metaheuristics include iterated local search, tabu search, the 
genetic algorithm, ant colony optimization, and simulated annealing. 

Blum and Roll outline nine properties of metaheuristics [9], as follows: 

• Metaheuristics are strategies that "guide" the search process. 

• The goal is to efficiently explore the search space in order to find 
(near-) optimal solutions. 

• Techniques which constitute metaheuristic algorithms range from 
simple local search procedures to complex learning processes. 

• Metaheuristic algorithms are approximate and usually non-deterministic. 

• They may incorporate mechanisms to avoid getting trapped in confined 
areas of the search space. 

• The basic concepts of metaheuristics permit an abstract level descrip- 
tion. 

• Metaheuristics are not problem-specific. 
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• Metaheuristics may make use of domain-specific knowledge in the 
form of heuristics that are controlled by the upper level strategy. 

• Todays more advanced metaheuristics use search experience (embodied 
in some form of memory) to guide the search. 

Hyperheuristics are yet another extension that focuses on heuristics 
that modify their parameters (online or offline) to improve the efficacy 
of solution, or the efficiency of the computation. Hyperheuristics provide 
high-level strategies that may employ machine learning and adapt their 
search behavior by modifying the application of the sub-procedures or even 
which procedures are used (operating on the space of heuristics which in 
turn operate within the problem domain) [12, 13]. 

1.1.5 Clever Algorithms 

This book is concerned with 'clever algorithms', which are algorithms 
drawn from many sub-fields of artificial intelligence not limited to the 
scruffy fields of biologically inspired computation, computational intelligence 
and metaheuristics. The term ^clever algorithms^ is intended to unify a 
collection of interesting and useful computational tools under a consistent 
and accessible banner. An alternative name (Inspired Algorithms) was 
considered, although ultimately rejected given that not all of the algorithms 
to be described in the project have an inspiration (specifically a biological or 
physical inspiration) for their computational strategy. The set of algorithms 
described in this book may generally be referred to as 'unconventional 
optimization algorithms' (for example, see [14]), as optimization is the main 
form of computation provided by the listed approaches. A technically more 
appropriate name for these approaches is stochastic global optimization (for 
example, see [49] and [35]). 

Algorithms were selected in order to provide a rich and interesting 
coverage of the fields of Biologically Inspired Computation, Metaheuristics 
and Computational Intelligence. Rather than a coverage of just the state-of- 
the-art and popular methods, the algorithms presented also include historic 
and newly described methods. The final selection was designed to provoke 
curiosity and encourage exploration and a wider view of the field. 

1.2 Problem Domains 

Algorithms from the fields of Computational Intelligence, Biologically In- 
spired Computing, and Metaheuristics are applied to difficult problems, to 
which more traditional approaches may not be suited. Michalewicz and 
Fogel propose five reasons why problems may be difficult [37] (page 11): 

• The number of possible solutions in the search space is so large as to 
forbid an exhaustive search for the best answer. 
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• The problem is so complicated, that just to facilitate any answer at 
all, we have to use such simplified models of the problem that any 
result is essentially useless. 

• The evaluation function that describes the quality of any proposed 
solution is noisy or varies with time, thereby requiring not just a single 
solution but an entire series of solutions. 

• The possible solutions are so heavily constrained that constructing 
even one feasible answer is difficult, let alone searching for an optimal 
solution. 

• The person solving the problem is inadequately prepared or imagines 
some psychological barrier that prevents them from discovering a 
solution. 

This section introduces two problem formalisms that embody many of the 
most difficult problems faced by Artificial and Computational Intelligence. 
They are: Function Optimization and Function Approximation. Each class 
of problem is described in terms of its general properties, a formalism, and 
a set of specialized sub-problems. These problem classes provide a tangible 
framing of the algorithmic techniques described throughout the work. 

1.2.1 Function Optimization 

Real-world optimization problems and generalizations thereof can be drawn 
from most fields of science, engineering, and information technology (for 
a sample [2, 48]). Importantly, function optimization problems have had 
a long tradition in the fields of Artificial Intelligence in motivating basic 
research into new problem solving techniques, and for investigating and 
verifying systemic behavior against benchmark problem instances. 

Problem Description 

Mathematically, optimization is defined as the search for a combination of pa- 
rameters commonly referred to as decision variables (x = {xi, X2, X3, . . . Xn}) 
which minimize or maximize some ordinal quantity (c) (typically a scalar 
called a score or cost) assigned by an objective function or cost function (/), 
under a set of constraints {g = {5'i,^2,5'3, • • -dn})- For example, a general 
minimization case would be as follows: f{x') < f{x),\/xi G x. Constraints 
may provide boundaries on decision variables (for example in a real- value hy- 
percube 3^"^), or may generally define regions of feasibility and in- feasibility 
in the decision variable space. In applied mathematics the field may be 
referred to as Mathematical Programming. More generally the field may 
be referred to as Global or Function Optimization given the focus on the 
objective function. For more general information on optimization refer to 
Horst et al. [29]. 
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Sub-Fields of Study 

The study of optimization is comprised of many specialized sub-fields, based 
on an overlapping taxonomy that focuses on the principle concerns in the 
general formalism. For example, with regard to the decision variables, 
one may consider univariate and multivariate optimization problems. The 
type of decision variables promotes specialities for continuous, discrete, 
and permutations of variables. Dependencies between decision variables 
under a cost function define the fields of Linear Programming, Quadratic 
Programming, and Nonlinear Programming. A large class of optimization 
problems can be reduced to discrete sets and are considered in the field 
of Combinatorial Optimization, to which many theoretical properties are 
known, most importantly that many interesting and relevant problems 
cannot be solved by an approach with polynomial time complexity (so- 
called NP, for example see Papadimitriou and Steiglitz [38]). 

THe evaluation of variables against a cost function, collectively may 
be considered a response surface. The shape of such a response surface 
may be convex, which is a class of functions to which many important 
theoretical findings have been made, not limited to the fact that location of 
the local optimal configuration also means the global optimal configuration 
of decision variables has been located [11]. Many interesting and real- world 
optimization problems produce cost surfaces that are non-convex or so called 
multi-modal^ (rather than unimodal) suggesting that there are multiple 
peaks and valleys. Further, many real-world optimization problems with 
continuous decision variables cannot be differentiated given their complexity 
or limited information availability, meaning that derivative-based gradient 
decent methods (that are well understood) are not applicable, necessitating 
the use of so-called 'direct search' (sample or pattern-based) methods [33]. 
Real- world objective function evaluation may be noisy, discontinuous, and/or 
dynamic, and the constraints of real- world problem solving may require 
an approximate solution in limited time or using resources, motivating the 
need for heuristic approaches. 

1.2.2 Function Approximation 

Real- world Function Approximation problems are among the most computa- 
tionally difficult considered in the broader field of Artificial Intelligence for 
reasons including: incomplete information, high-dimensionality, noise in the 
sample observations, and non-linearities in the target function. This section 
considers the Function Approximation formalism and related specialization's 
as a general motivating problem to contrast and compare with Function 
Optimization. 

""^ Taken from statistics referring to the centers of mass in distributions, although in 
optimization it refers to 'regions of interest' in the search space, in particular valleys in 
minimization, and peaks in maximization cost surfaces. 
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Problem Description 

Function Approximation is the problem of finding a function (/) that ap- 
proximates a target function (g), where typically the approximated function 
is selected based on a sample of observations {x, also referred to as the 
training set) taken from the unknown target function. In machine learning", 
the function approximation formalism is used to describe general problem 
types commonly referred to as pattern recognition, such as classification, 
clustering, and curve fitting (called a decision or discrimination function). 
Such general problem types are described in terms of approximating an 
unknown Probability Density Function (PDF), which underlies the relation- 
ships in the problem space, and is represented in the sample data. This 
perspective of such problems is commonly referred to as statistical machine 
learning and/or density estimation [8, 24]. 

Sub-Fields of Study 

The function approximation formalism can be used to phrase some of the 
hardest problems faced by Computer Science, and Artificial Intelligence 
in particular, such as natural language processing and computer vision. 
The general process focuses on 1) the collection and preparation of the 
observations from the target function, 2) the selection and/or preparation of 
a model of the target function, and 3) the application and ongoing refinement 
of the prepared model. Some important problem-based sub-fields include: 

• Feature Selection where a feature is considered an aggregation of 
one-or-more attributes, where only those features that have meaning 
in the context of the target function are necessary to the modeling" 
function [27, 32]. 

• Classification where observations are inherently organized into la- 
belled groups (classes) and a supervised process models an underlying 
discrimination function to classify unobserved samples. 

• Clustering where observations may be organized into groups based 
on underlying common features, although the groups are unlabeled 
requiring a process to model an underlying discrimination function 
without corrective feedback. 

• Curve or Surface Fitting where a model is prepared that provides a 
'best- fit' (called a regression) for a set of observations that may be 
used for interpolation over known observations and extrapolation for 
observations outside what has been modeled. 

The field of Function Optimization is related to Function Approxima- 
tion, as many-sub-problems of Function Approximation may be defined as 
optimization problems. Many of the technique paradigms used for function 
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approximation are differentiated based on the representation and the op- 
timization process used to minimize error or maximize effectiveness on a 
given approximation problem. The difficulty of Function Approximation 
problems centre around 1) the nature of the unknown relationships between 
attributes and features, 2) the number (dimensionality) of attributes and 
features, and 3) general concerns of noise in such relationships and the 
dynamic availability of samples from the target function. Additional diffi- 
culties include the incorporation of prior knowledge (such as imbalance in 
samples, incomplete information and the variable reliability of data), and 
problems of invariant features (such as transformation, translation, rotation, 
scaling, and skewing of features). 

1.3 Unconventional Optimization 

Not all algorithms described in this book are for optimization, although 
those that are may be referred to as 'unconventional' to differentiate them 
from the more traditional approaches. Examples of traditional approaches 
include (but are not not limited) mathematical optimization algorithms 
(such as Newton's method and Gradient Descent that use derivatives to 
locate a local minimum) and direct search methods (such as the Simplex 
method and the Nelder-Mead method that use a search pattern to locate 
optima). Unconventional optimization algorithms are designed for the 
more difficult problem instances, the attributes of which were introduced in 
Section 1.2.1. This section introduces some common attributes of this class 
of algorithm. 

1.3.1 Black Box Algorithms 

Black Box optimization algorithms are those that exploit little, if any, 
information from a problem domain in order to devise a solution. They are 
generalized problem solving procedures that may be applied to a range of 
problems with very little modification [19]. Domain specific knowledge refers 
to known relationships between solution representations and the objective 
cost function. Generally speaking, the less domain specific information 
incorporated into a technique, the more flexible the technique, although the 
less efficient it will be for a given problem. For example, 'random search' is 
the most general black box approach and is also the most flexible requiring 
only the generation of random solutions for a given problem. Random 
search allows resampling of the domain which gives it a worst case behavior 
that is worse than enumerating the entire search domain. In practice, the 
more prior knowledge available about a problem, the more information that 
can be exploited by a technique in order to efficiently locate a solution for 
the problem, heuristically or otherwise. Therefore, black box methods are 
those methods suitable for those problems where little information from the 
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problem domain is available to be used by a problem solving approach. 

1.3.2 No-Free-Lunch 

The No-Free-Lunch Theorem of search and optimization by Wolpert and 
Macready proposes that all black box optimization algorithms are the same 
for searching for the extremum of a cost function when averaged over all 
possible functions [50, 51]. The theorem has caused a lot of pessimism and 
misunderstanding, particularly in relation to the evaluation and comparison 
of Metaheuristic and Computational hitelligence algorithms. 

The implication of the theorem is that searching for the 'best' general- 
purpose black box optimization algorithm is irresponsible as no such pro- 
cedure is theoretically possible. No-Free-Lunch applies to stochastic and 
deterministic optimization algorithms as well as to algorithms that learn and 
adjust their search strategy over time. It is independent of the performance 
measure used and the representation selected. Wolpert and Macready's 
original paper was produced at a time when grandiose generalizations were 
being made as to algorithm, representation, or configuration superiority. 
The practical impact of the theory is to encourage practitioners to bound 
claims of applicability for search and optimization algorithms. Wolpert and 
Macready encouraged effort be put into devising practical problem classes 
and into the matching of suitable algorithms to problem classes. Further, 
they compelled practitioners to exploit domain knowledge in optimization 
algorithm application, which is now an axiom in the field. 

1.3.3 Stochastic Optimization 

Stochastic optimization algorithms are those that use randomness to elicit 
non-deterministic behaviors, contrasted to purely deterministic procedures. 
Most algorithms from the fields of Computational Intelligence, Biologically 
Inspired Computation, and Metaheuristics may be considered to belong the 
field of Stochastic Optimization. Algorithms that exploit randomness are not 
random in behavior, rather they sample a problem space in a biased manner, 
focusing on areas of interest and neglecting less interesting areas [45]. A 
class of techniques that focus on the stochastic sampling of a domain, called 
Markov Chain Monte Carlo (MCMC) algorithms, provide good average 
performance, and generally offer a low chance of the worst case performance. 
Such approaches are suited to problems with many coupled degrees of 
freedom, for example large, high-dimensional spaces. MCMC approaches 
involve stochastically sampling from a target distribution function similar 
to Monte Carlo simulation methods using a process that resembles a biased 
Markov chain. 

• Monte Carlo methods are used for selecting a statistical sample to 
approximate a given target probability density function and are tradi- 
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tionally used in statistical physics. Samples are drawn sequentially 
and the process may include criteria for rejecting samples and biasing 
the sampling locations within high-dimensional spaces. 

• Markov Chain processes provide a probabilistic model for state tran- 
sitions or moves within a discrete domain called a walk or a chain of 
steps. A Markov system is only dependent on the current position in 
the domain in order to probabilistically determine the next step in 
the walk. 

MCMC techniques combine these two approaches to solve integration 
and optimization problems in large dimensional spaces by generating sam- 
ples while exploring the space using a Markov chain process, rather than 
sequentially or independently [3]. The step generation is configured to bias 
sampling in more important regions of the domain. Three examples of 
MCMC techniques include the Metropolis-Hastings algorithm, Simulated 
Annealing for global optimization, and the Gibbs sampler which are com- 
monly employed in the fields of physics, chemistry, statistics, and economics. 

1.3.4 Inductive Learning 

Many unconventional optimization algorithms employ a process that includes 
the iterative improvement of candidate solutions against an objective cost 
function. This process of adaptation is generally a method by which the 
process obtains characteristics that improve the system's (candidate solution) 
relative performance in an environment (cost function). This adaptive 
behavior is commonly achieved through a 'selectionist process' of repetition 
of the steps: generation, test, and selection. The use of non-deterministic 
processes mean that the sampling of the domain (the generation step) is 
typically non-parametric, although guided by past experience. 

The method of acquiring information is called inductive learning or 
learning from example, where the approach uses the implicit assumption 
that specific examples are representative of the broader information content 
of the environment, specifically with regard to anticipated need. Many 
unconventional optimization approaches maintain a single candidate solution, 
a population of samples, or a compression thereof that provides both an 
instantaneous representation of all of the information acquired by the process, 
and the basis for generating and making future decisions. 

This method of simultaneously acquiring and improving information 
from the domain and the optimization of decision making (where to direct 
future effort) is called the /c-armed bandit (two-armed and multi-armed 
bandit) problem from the field of statistical decision making known as game 
theory [7, 42]. This formalism considers the capability of a strategy to 
allocate available resources proportional to the future payoff the strategy 
is expected to receive. The classic example is the 2-armed bandit problem 
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used by Goldberg to describe the behavior of the genetic algorithm [26]. The 
example involves an agent that learns which one of the two slot machines 
provides more return by pulling the handle of each (sampling the domain) 
and biasing future handle pulls proportional to the expected utility, based 
on the probabilistic experience with the past distribution of the payoff. 
The formalism may also be used to understand the properties of inductive 
learning demonstrated by the adaptive behavior of most unconventional 
optimization algorithms. 

The stochastic iterative process of generate and test can be computation- 
ally wasteful, potentially re-searching areas of the problem space already 
searched, and requiring many trials or samples in order to achieve a 'good 
enough' solution. The limited use of prior knowledge from the domain 
(black box) coupled with the stochastic sampling process mean that the 
adapted solutions are created without top-down insight or instruction can 
sometimes be interesting, innovative, and even competitive with decades of 
human expertise [31]. 

1.4 Book Organization 

The remainder of this book is organized into two parts: Algorithms that 
describes a large number of techniques in a complete and a consistent 
manner presented in a rough algorithm groups, and Extensions that reviews 
more advanced topics suitable for when a number of algorithms have been 
mastered. 

1.4.1 Algorithms 

Algorithms are presented in six groups or kingdoms distilled from the broader 
fields of study each in their own chapter, as follows: 

• Stochastic Algorithms that focuses on the introduction of randomness 
into heuristic methods (Chapter 2). 

• Evolutionary Algorithms inspired by evolution by means of natural 
selection (Chapter 3). 

• Physical Algorithms inspired by physical and social systems (Chap- 
ter 4). 

• Probabilistic Algorithms that focuses on methods that build models 
and estimate distributions in search domains (Chapter 5). 

• Swarm Algorithms that focuses on methods that exploit the properties 
of collective intelligence (Chapter 6). 

• Immune Algorithms inspired by the adaptive immune system of verte- 
brates (Chapter 7). 
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• Neural Algorithms inspired by the plasticity and learning qualities of 
the human nervous system (Chapter 8). 

A given algorithm is more than just a procedure or code listing, each 
approach is an island of research. The meta-information that define the 
context of a technique is just as important to understanding and application 
as abstract recipes and concrete implementations. A standardized algorithm 
description is adopted to provide a consistent presentation of algorithms 
with a mixture of softer narrative descriptions, programmatic descriptions 
both abstract and concrete, and most importantly useful sources for finding 
out more information about the technique. 

The standardized algorithm description template covers the following" 
subjects: 

• Name: The algorithm name defines the canonical name used to refer 
to the technique, in addition to common aliases, abbreviations, and 
acronyms. The name is used as the heading of an algorithm description. 

• Taxonomy: The algorithm taxonomy defines where a technique fits 
into the field, both the specific sub-fields of Computational Intelligence 
and Biologically Inspired Computation as well as the broader field 
of Artificial Intelligence. The taxonomy also provides a context for 
determining the relationships between algorithms. 

• Inspiration: (where appropriate) The inspiration describes the specific 
system or process that provoked the inception of the algorithm. The 
inspiring system may non-exclusively be natural, biological, physical, 
or social. The description of the inspiring system may include relevant 
domain specific theory, observation, nomenclature, and those salient 
attributes of the system that are somehow abstractly or conceptually 
manifest in the technique. 

• Metaphor: (where appropriate) The metaphor is a description of the 
technique in the context of the inspiring system or a different suitable 
system. The features of the technique are made apparent through 
an analogous description of the features of the inspiring system. The 
explanation through analogy is not expected to be literal, rather the 
method is used as an allegorical communication tool. The inspiring 
system is not explicitly described, this is the role of the 'inspiration' 
topic, which represents a loose dependency for this topic. 

• Strategy: The strategy is an abstract description of the computational 
model. The strategy describes the information processing actions 
a technique shall take in order to achieve an objective, providing a 
logical separation between a computational realization (procedure) and 
an analogous system (metaphor). A given problem solving strategy 
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may be realized as one of a number of specific algorithms or problem 
solving systems. 

• Procedure: The algorithmic procedure summarizes the specifics of 
realizing a strategy as a systemized and parameterized computation. 
It outlines how the algorithm is organized in terms of the computation, 
data structures, and representations. 

• Heuristics: The heuristics section describes the commonsense, best 
practice, and demonstrated rules for applying and configuring a pa- 
rameterized algorithm. The heuristics relate to the technical details 
of the technique's procedure and data structures for general classes 
of application (neither specific implementations nor specific problem 
instances). 

• Code Listing: The code listing description provides a minimal but 
functional version of the technique implemented with a programming 
language. The code description can be typed into a computer and 
provide a working execution of the technique. The technique imple- 
mentation also includes a minimal problem instance to which it is 
applied, and both the problem and algorithm implementations are 
complete enough to demonstrate the techniques procedure. The de- 
scription is presented as a programming source code listing with a 
terse introductory summary. 

• References: The references section includes a listing of both primary 
sources of information about the technique as well as useful intro- 
ductory sources for novices to gain a deeper understanding of the 
theory and application of the technique. The description consists 
of hand-selected reference material including books, peer reviewed 
conference papers, and journal articles. 

Source code examples are included in the algorithm descriptions, and 
the Ruby Programming Language was selected for use throughout the 
book. Ruby was selected because it supports the procedural program- 
ming paradigm, adopted to ensure that examples can be easily ported to 
object-oriented and other paradigms. Additionally, Ruby is an interpreted 
language, meaning the code can be directly executed without an introduced 
compilation step, and it is free to download and use from the Internet.^ 
Ruby is concise, expressive, and supports meta-programming features that 
improve the readability of code examples. 

The sample code provides a working version of a given technique for 
demonstration purposes. Having a tinker with a technique can really 
bring it to life and provide valuable insight into a method. The sample 
code is a minimum implementation, providing plenty of opportunity to 

■^Ruby can be downloaded for free from http : //www. ruby— lang. org 
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explore, extend and optimize. All of the source code for the algorithms 
presented in this book is available from the companion website, online at 
http : //www . CleverAlgorithms . com. All algorithm implementations were 
tested with Ruby 1.8.6, 1.8.7 and 1.9. 

1.4.2 Extensions 

There are some some advanced topics that cannot be meaningfully considered 
until one has a firm grasp of a number of algorithms, and these are discussed 
at the back of the book. The Advanced Topics chapter addresses topics such 
as: the use of alternative programming paradigms when implementing clever 
algorithms, methodologies used when devising entirely new approaches, 
strategies to consider when testing clever algorithms, visualizing the behavior 
and results of algorithms, and comparing algorithms based on the results 
they produce using statistical methods. Like the background information 
provided in this chapter, the extensions provide a gentle introduction and 
starting point into some advanced topics, and references for seeking a deeper 
understanding. 

1.5 How to Read this Book 

This book is a reference text that provides a large compendium of algorithm 
descriptions. It is a trusted handbook of practical computational recipes to 
be consulted when one is confronted with difficult function optimization and 
approximation problems. It is also an encompassing guidebook of modern 
heuristic methods that may be browsed for inspiration, exploration, and 
general interest. 

The audience for this work may be interested in the fields of Computa- 
tional Intelligence, Biologically Inspired Computation, and Metaheuristics 
and may count themselves as belonging to one of the following broader 
groups: 

• Scientists: Research scientists concerned with theoretically or empir- 
ically investigating algorithms, addressing questions such as: What 
is the motivating system and strategy for a given technique ? What 
are some algorithms that may be used in a comparison within a given 
subfield or across subfields? 

• Engineers: Programmers and developers concerned with implementing, 
applying, or maintaining algorithms, addressing questions such as: 
What IS the procedure for a given technique? What are the best practice 
heuristics for employing a given technique? 

• Students: Undergraduate and graduate students interested in learn- 
ing about techniques, addressing questions such as: What are some 
interesting algorithms to study? How to implement a given approach? 
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• Amateurs: Practitioners interested in knowing more about algorithms, 
addressing questions such as: What classes of techniques exist and what 
algorithms do they provide? How to conceptualize the computation of 
a technique? 

1.6 Further Reading 

This book is not an introduction to Artificial Intelligence or related sub-fields, 
nor is it a field guide for a specific class of algorithms. This section provides 
some pointers to selected books and articles for those readers seeking a 
deeper understanding of the fields of study to which the Clever Algorithms 
described in this book belong. 

1.6.1 Artificial Intelligence 

Artificial Intelligence is large field of study and many excellent texts have 
been written to introduce the subject. Russell and Novig's ^''Artificial 
Intelligence: A Modern Approach'''' is an excellent introductory text providing 
a broad and deep review of what the field has to offer and is useful for 
students and practitioners alike [43]. Luger and Stubblefield's ^'Artificial 
Intelligence: Structures and Strategies for Complex Problem Solving'''' is also 
an excellent reference text, providing a more empirical approach to the field 
[34]. 

1.6.2 Computational Intelligence 

Introductory books for the field of Computational Intelligence generally 
focus on a handful of specific sub-fields and their techniques. Engelbrecht's 
''Computational Intelligence: An Introduction'^ provides a modern and de- 
tailed introduction to the field covering classic subjects such as Evolutionary 
Computation and Artificial Neural Networks, as well as more recent tech- 
niques such as Swarm Intelligence and Artificial Immune Systems [20]. 
Pedrycz's slightly more dated "Computational Intelligence: An Introduction^'' 
also provides a solid coverage of the core of the field with some deeper 
insights into fuzzy logic and fuzzy systems [41]. 

1.6.3 Biologically Inspired Computation 

Computational methods inspired by natural and biologically systems repre- 
sent a large portion of the algorithms described in this book. The collection 
of articles published in de Castro and Von Zuben's "Recent Developments 
in Biologically Inspired Computing'''' provides an overview of the state of 
the field, and the introductory chapter on need for such methods does an 
excellent job to motivate the field of study [17]. Forbes's "Imitation of Life: 
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How Biology Is Inspiring Computing" sets the scene for Natural Computing 
and the interrelated disciplines, of which Biologically hispired Computing 
is but one useful example [22]. Finally, Benyus's ^''Biomimicry: Innovation 
Inspired by Nature''' provides a good introduction into the broader related 
field of a new frontier in science and technology that involves building 
systems inspired by an understanding of the biological world [6] . 

1.6.4 Metaheuristics 

The field of Metaheuristics was initially constrained to heuristics for applying 
classical optimization procedures, although has expanded to encompass a 
broader and diverse set of techniques. Michalewicz and Fogel's "i7ou; to 
Solve It: Modern Heuristics'''' provides a practical tour of heuristic methods 
with a consistent set of worked examples [37]. Glover and Kochenberger's 
''''Handbook of Metaheuristics'''' provides a solid introduction into a broad 
collection of techniques and their capabilities [25]. 

1.6.5 The Ruby Programming Language 

The Ruby Programming Language is a multi-paradigm dynamic language 
that appeared in approximately 1995. Its meta-programming capabilities 
coupled with concise and readable syntax have made it a popular language 
of choice for web development, scripting, and application development. 
The classic reference text for the language is Thomas, Fowler, and Hunt's 
^^Programming Ruby: The Pragmatic Programmers ' Guide'''' referred to as the 
'pickaxe book' because of the picture of the pickaxe on the cover [47]. An 
updated edition is available that covers version 1.9 (compared to 1.8 in the 
cited version) that will work just as well for use as a reference for the examples 
in this book. Flanagan and Matsumoto's "T/ie Ruby Programming Language'''' 
also provides a seminal reference text with contributions from Yukihiro 
Matsumoto, the author of the language [21]. For more information on the 
Ruby Programming Language, see the quick-start guide in Appendix A. 
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Chapter 2 

Stochastic Algorithms 



2.1 Overview 

This chapter describes Stochastic Algorithms. 

2.1.1 Stochastic Optimization 

The majority of the algorithms to be described in this book are comprised 
of probabilistic and stochastic processes. What differentiates the 'stochastic 
algorithms' in this chapter from the remaining algorithms is the specific lack 
of 1) an inspiring system, and 2) a metaphorical explanation. Both 'inspira- 
tion' and 'metaphor' refer to the descriptive elements in the standardized 
algorithm description. 

These described algorithms are predominately global optimization al- 
gorithms and metaheuristics that manage the application of an embedded 
neighborhood exploring (local) search procedure. As such, with the excep- 
tion of 'Stochastic Hill Climbing' and 'Random Search' the algorithms may 
be considered extensions of the multi-start search (also known as multi- 
restart search). This set of algorithms provide various different strategies by 
which 'better' and varied starting points can be generated and issued to a 
neighborhood searching technique for refinement, a process that is repeated 
with potentially improving or unexplored areas to search. 
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2.2 Random Search 

Random Search, RS, Blind Random Search, Blind Search, Pure Random 
Search, PRS 

2.2.1 Taxonomy 

Random search belongs to the fields of Stochastic Optimization and Global 
Optimization. Random search is a direct search method as it does not 
require derivatives to search a continuous domain. This base approach is 
related to techniques that provide small improvements such as Directed 
Random Search, and Adaptive Random Search (Section 2.3). 

2.2.2 Strategy 

The strategy of Random Search is to sample solutions from across the entire 
search space using a uniform probability distribution. Each future sample 
is independent of the samples that come before it. 

2.2.3 Procedure 

Algorithm 2.2.1 provides a pseudocode listing of the Random Search Algo- 
rithm for minimizing a cost function. 

Algorithm 2.2.1: Pseudocode for Random Search. 
Input: Numlterations, ProblemSize, SearchSpace 
Output: Best 

1 Best ^ 0; 

2 foreach iteri G Numlterations do 

3 candidatci ^ RandomSolutionC ProblemSize, SearchSpace); 

4 if Cost (candidatci) < Cost (Best) then 

5 I Best ^ candidatci; 

6 end 

7 end 

8 return Best; 



2.2.4 Heuristics 

• Random search is minimal in that it only requires a candidate solution 
construction routine and a candidate solution evaluation routine, both 
of which may be calibrated using the approach. 
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• The worst case performance for Random Search for locating the 
optima is worse than an Enumeration of the search domain, given 
that Random Search has no memory and can bhndly resample. 

• Random Search can return a reasonable approximation of the optimal 
solution within a reasonable time under low problem dimensionality, 
although the approach does not scale well with problem size (such as 
the number of dimensions). 

• Care must be taken with some problem domains to ensure that random 
candidate solution construction is unbiased 

• The results of a Random Search can be used to seed another search 
technique, like a local search technique (such as the Hill Climbing algo- 
rithm) that can be used to locate the best solution in the neighborhood 
of the 'good' candidate solution. 

2.2.5 Code Listing 

Listing 2.1 provides an example of the Random Search Algorithm imple- 
mented in the Ruby Programming Language. In the example, the algorithm 
runs for a fixed number of iterations and returns the best candidate solution 
discovered. The example problem is an instance of a continuous function 
optimization that seeks min/(a:) where / = X^^^^ xf, —5.0 < xi < 5.0 and 
n = 2. The optimal solution for this basin function is {vq, ...,t'„_i) = 0.0. 

def objective_function(vector) 

return vector . inject (0) {Isum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 

return Array .new (minmax. size) do |i| 

minmax[i][0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def search (search_space, max_iter) 
best = nil 

max_iter . times do I iter | 
candidate = {.} 

candidate [: vector] = random_vector (search_space) 
candidate [: cost] = objective_f unction(candidate [: vector] ) 
best = candidate if best. nil? or candidate [: cost] < best [: cost] 
puts " > iteration=#-[ (iter+1) } , best=#-Cbest [ : cost] }- " 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array. new(problem_size) {|i| [-5, +5]} 
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# algorithm configuration 
max_iter = 100 

# execute the algorithm 

best = search(search_space , max_iter) 

puts "Done. Best Solution: c=#-Cbest [ : cost] }■ , v=#{best [: vector] . inspect} " 
end 

Listing 2.1: Random Search in Ruby 



2.2.6 References 

Primary Sources 

There is no seminal specification of the Random Search algorithm, rather 
there are discussions of the general approach and related random search 
methods from the 1950s through to the 1970s. This was around the time 
that pattern and direct search methods were actively researched. Brooks is 
credited with the so-called 'pure random search' [1]. Two seminal reviews 
of 'random search methods' of the time include: Karnopp [2] and prhaps 
Kul'chitskii [3]. 

Learn More 

For overviews of Random Search Methods see Zhigljavsky [9], Solis and 
Wets [4], and also White [7] who provide an insightful review article. Spall 
provides a detailed overview of the field of Stochastic Optimization, including 
the Random Search method [5] (for example, see Chapter 2). For a shorter 
introduction by Spall, see [6] (specifically Section 6.2). Also see Zabinsky 
for another detailed review of the broader field [8]. 
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Copyrighted material 



34 



Chapter 2. Stochastic Algorithms 



2.3 Adaptive Random Search 

Adaptive Random Search, ARS, Adaptive Step Size Random Search, ASSRS, 
Variable Step-Size Random Search. 

2.3.1 Taxonomy 

The Adaptive Random Search algorithm belongs to the general set of 
approaches known as Stochastic Optimization and Global Optimization. It 
is a direct search method in that it does not require derivatives to navigate 
the search space. Adaptive Random Search is an extension of the Random 
Search (Section 2.2) and Localized Random Search algorithms. 

2.3.2 Strategy 

The Adaptive Random Search algorithm was designed to address the lim- 
itations of the fixed step size in the Localized Random Search algorithm. 
The strategy for Adaptive Random Search is to continually approximate 
the optimal step size required to reach the global optimum in the search 
space. This is achieved by trialling and adopting smaller or larger step sizes 
only if they result in an improvement in the search performance. 

The Strategy of the Adaptive Step Size Random Search algorithm (the 
specific technique reviewed) is to trial a larger step in each iteration and 
adopt the larger step if it results in an improved result. Very large step 
sizes are trialled in the same manner although with a much lower frequency. 
This strategy of preferring large moves is intended to allow the technique to 
escape local optima. Smaller step sizes are adopted if no improvement is 
made for an extended period. 

2.3.3 Procedure 

Algorithm 2.3.1 provides a pseudocode listing of the Adaptive Random 
Search Algorithm for minimizing a cost function based on the specification 
for 'Adaptive Step-Size Random Search' by Schummer and Steiglitz [6]. 

2.3.4 Heuristics 

• Adaptive Random Search was designed for continuous function opti- 
mization problem domains. 

• Candidates with equal cost should be considered improvements to 
allow the algorithm to make progress across plateaus in the response 
surface. 

• Adaptive Random Search may adapt the search direction in addition 
to the step size. 
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Algorithm 2.3.1: Pseudocode for Adaptive Random Search. 

Input: Itermax, Problemsize, SearchSpace, StepSize^^^^^^, 

StepSizef:^,^, StepSize'pZ^, StepSize%l,^^, 

NoChangemax 
Output: S 

1 NoChange count ^ 0; 

2 StepSizci InitializeStepSizeCSearchSpace, StepSizeJ^^^^^^) 

3 5 RandomSolution(Pro6/emsj2;e> SearchSpace); 

4 for i = 0 to Iter^jiax do 
Si <(— TakeStep (SearchSpace, S, StepSizci); 

StepSize^^^^ ^ 0; 
if i mod StepSize^^^^^^^ then 

StepSize^^^^ ^ StepSizci x StepSize^^^^^^; 

else 

StepSize\^''^^ ^ StepSiza x StepSize'/^cior^ 
end 

5*2 ^ TakeStep (SearchSpace, S, StepSize^^^^)] 

if Cost(^i)<Cost(5) Cost(52)<Cost(5) then 

if Cost(»S2)<Cost(5i) then 

S ^ S2; 

StepSizCi i— StepSize^^^^] 
else 
I S^Sv, 
end 

NoChangCcount ^ 0; 
else 

NoChangecount ^ NoChangCcount + 1; 
if N oChangCcoujit > NoChangemax then 
NoChangecount ^ 0; 



btepbizei 



small 5 



end 
end 

28 end 

29 return 5; 
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• The step size may be adapted for all parameters, or for each parameter 
individually. 

2.3.5 Code Listing 

Listing 2.2 provides an example of the Adaptive Random Search Algorithm 
implemented in the Ruby Programming Language, based on the specification 
for 'Adaptive Step-Size Random Search' by Schummer and Steiglitz [6]. 
In the example, the algorithm runs for a fixed number of iterations and 
returns the best candidate solution discovered. The example problem is an 
instance of a continuous function optimization that seeks min/(a;) where 
/ = X^^^i xf, —5.0 < Xi < 5.0 and n = 2. The optimal solution for this 
basin function is {vq, . . . , Vn-i) = 0.0. 

def objective_function(vector) 

return vector . inject (0) {Isum, x| sum + (x ** 2.0)} 
end 

def rand_in_bounds (min, max) 

return min + ( (max-min) * randO) 
end 

def random_vector (minmax) 

return Array . new(minmax . size) do |i| 

rand_in_bounds (minmax [i] [0] , minmax [i] [1]) 

end 
end 

def take_step (minmax, current, step_size) 
position = Array . new(current . size) 
position. size . times do |i| 

min = [minmax [i] [0] , current [i] -step_size] .max 

max = [minmax [i] [1] , current [i] +step_size] .min 

position [i] = rand_in_bounds (min, max) 
end 

return position 
end 

def large_step_size (iter , step_size, s_factor, l_factor, iter_mult) 
return step_size * l_f actor if iter>0 and iter . modulo (iter_mult) == 0 
return step_size * s_factor 

end 

def take_steps (bounds , current, step_size, big_stepsize) 
step, big_step = {}, {} 

step [: vector] = take_step (bounds , current [: vector] , step_size) 
step[:cost] = obj ective_f unction (step [: vector] ) 

big_step [: vector] = take_step (bounds, current [: vector] ,big_stepsize) 
big_step [ : cost] = obj ective_f unction (big_step [: vector] ) 
return step, big_step 
end 

def search (max_iter , bounds, init_f actor, s_f actor, l_f actor, iter_mult, 
max_no_impr) 
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step_size = (bounds [0] [1] -bounds [0] [0] ) * init_f actor 
current, count = {}-, 0 

current [: vector] = random_vector (bounds) 

current [: cost] = obj ective_f unction(current [: vector] ) 

iiiax_ it er . times do |iter| 

big_stepsize = large_step_size (iter , step_size, s_factor, l_factor, 
iter_mult) 

step, big_step = take_steps (bounds , current, step_size, big_stepsize) 
if step [: cost] <= current [: cost] or big_step [ : cost] <= current [: cost] 
if big_step [ : cost] <= step[:cost] 

step_size, current = big_stepsize , big_step 
else 

current = step 
end 

count = 0 
else 

count += 1 

count, stepSize = 0, (step_size/s_f actor) if count >= max_no_impr 
end 

puts " > iteration #{ (iter+1) } , best=#{current [ : cost] }• " 
end 

return current 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

bounds = Array . new(problem_size) {|i| [-5, +5]} 

# algorithm configuration 
inax_iter = 1000 

init_f actor = 0.05 
s_factor = 1.3 
l_factor = 3.0 
iter_mult = 10 
iiiax_no_iiiipr = 30 

# execute the algorithm 

best = search (inax_iter , bounds, init_factor, s_factor, l_factor, 

iter_mult, max_no_impr) 
puts "Done. Best Solution: c=#{best [ : cost] ]- , v=#-[best [: vector] . inspect}" 
end 

Listing 2.2: Adaptive Random Search in Ruby 



2.3.6 References 

Primary Sources 

Many works in the 1960s and 1970s experimented with variable step sizes for 
Random Search methods. Schummer and Steiglitz are commonly credited 
the adaptive step size procedure, which they called 'Adaptive Step-Size 
Random Search' [6]. Their approach only modifies the step size based on an 
approximation of the optimal step size required to reach the global optima. 
Kregting and White review adaptive random search methods and propose 
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an approach called 'Adaptive Directional Random Search' that modifies 
both the algorithms step size and direction in response to the cost function 

[2]. 

Learn More 

White reviews extensions to Rastrigin's 'Creeping Random Search' [4] (fixed 
step size) that use probabilistic step sizes drawn stochastically from uniform 
and probabilistic distributions [7]. White also reviews works that propose 
dynamic control strategies for the step size, such as Karnopp [1] who proposes 
increases and decreases to the step size based on performance over very 
small numbers of trials. Schrack and Choit review random search methods 
that modify their step size in order to approximate optimal moves while 
searching, including the property of reversal [5]. Masri et al. describe an 
adaptive random search strategy that alternates between periods of fixed 
and variable step sizes [3]. 
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2.4 Stochastic Hill Climbing 

Stochastic Hill Climbing, SHC, Random Hill Climbing, RHC, Random 
Mutation Hill Climbing, RMHC 

2.4.1 Taxonomy 

The Stochastic Hill Climbing algorithm is a Stochastic Optimization algo- 
rithm and is a Local Optimization algorithm (contrasted to Global Opti- 
mization). It is a direct search technique, as it does not require derivatives 
of the search space. Stochastic Hill Climbing is an extension of deterministic 
hill climbing algorithms such as Simple Hill Climbing (first-best neighbor), 
Steepest- Ascent Hill Climbing (best neighbor), and a parent of approaches 
such as Parallel Hill Climbing and Random- Rest art Hill Climbing. 

2.4.2 Strategy 

The strategy of the Stochastic Hill Climbing algorithm is iterate the process 
of randomly selecting a neighbor for a candidate solution and only accept it 
if it results in an improvement. The strategy was proposed to address the 
limitations of deterministic hill climbing techniques that were likely to get 
stuck in local optima due to their greedy acceptance of neighboring moves. 

2.4.3 Procedure 

Algorithm 2.4.1 provides a pseudocode listing of the Stochastic Hill Climbing 
algorithm for minimizing a cost function, specifically the Random Mutation 
Hill Climbing algorithm described by Forrest and Mitchell applied to a 
maximization optimization problem [3]. 



Algorithm 2.4.1: Pseudocode for Stochastic Hill Climbing. 

Input: Itevmaxi ProblemSize 
Output: Current 

1 Current ^ RandomSolution(ProblemSize) ; 

2 foreach iteri E Itermax do 

3 Candidate ^ RandomNeighbor (Current) ; 

4 if Cost (Candidate) > Cost (Current) then 

5 I Current ^ Candidate; 

6 end 

7 end 

8 return Current; 
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2.4.4 Heuristics 

• Stochastic Hill Climbing was designed to be used in discrete domains 
with explicit neighbors such as combinatorial optimization (compared 
to continuous function optimization). 

• The algorithm's strategy may be applied to continuous domains by 
making use of a step-size to define candidate-solution neighbors (such 
as Localized Random Search and Fixed Step-Size Random Search). 

• Stochastic Hill Climbing is a local search technique (compared to 
global search) and may be used to refine a result after the execution 
of a global search algorithm. 

• Even though the technique uses a stochastic process, it can still get 
stuck in local optima. 

• Neighbors with better or equal cost should be accepted, allowing the 
technique to navigate across plateaus in the response surface. 

• The algorithm can be restarted and repeated a number of times after 
it converges to provide an improved result (called Multiple Restart 
Hill Climbing). 

• The procedure can be applied to multiple candidate solutions concur- 
rently, allowing multiple algorithm runs to be performed at the same 
time (called Parallel Hill Climbing). 

2.4.5 Code Listing 

Listing 2.3 provides an example of the Stochastic Hill Climbing algorithm 
implemented in the Ruby Programming Language, specifically the Random 
Mutation Hill Climbing algorithm described by Forrest and Mitchell [3]. 
The algorithm is executed for a fixed number of iterations and is applied to 
a binary string optimization problem called 'One Max'. The objective of 
this maximization problem is to prepare a string of all '1' bits, where the 
cost function only reports the number of bits in a given string. 

def onemax (vector ) 

return vector . inject (0 . 0) -[ I sum, v| sum + ((v=="l") ? 1 : 0)> 
end 

def random_bitstr ing (num_bits) 

return Array .new(num_bits){ I i I (rand<0.5) ? "1" : "0"> 
end 

def random_neighbor (bitstring) 
mutant = Array . new(bitstring) 
pos = rand (bitstring. size) 

mutant [pos] = (mutant [pos] ==' 1 ' ) ? '0' : '1' 
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return mutant 
end 

def search(max_iterations, num_bits) 
candidate = -[} 

candidate [: vector] = random_bitstring (nuiii_bits) 
candidate [: cost] = onemax (candidate [: vector] ) 
max_iterations . times do |iter| 
neighbor = ■[}■ 

neighbor [: vector] = random_neighbor (candidate [: vector] ) 
neighbor [: cost] = onemax (neighbor [: vector] ) 
candidate = neighbor if neighbor [: cost] >= candidate [: cost] 
puts " > iteration #-[ (iter+1) } , best=#{candidate [: cost] ]-" 
break if candidate [: cost] == num_bits 
end 

return candidate 
end 

if __FILE__ == $0 

# problem configuration 
num_bits = 64 

# algorithm configuration 
max_iterations = 1000 

# execute the algorithm 

best = search(max_iterations , num_bits) 

puts "Done. Best Solution: c=#-[best [ : cost] }■ , v=#-Cbest [: vector] . j oin} " 
end 



Listing 2.3: Stochastic Hill Climbing in Ruby 



2.4.6 References 
Primary Sources 

Perhaps the most popular implementation of the Stochastic Hill Climbing 
algorithm is by Forrest and Mitchell, who proposed the Random Muta- 
tion Hill Climbing (RMHC) algorithm (with communication from Richard 
Palmer) in a study that investigated the behavior of the genetic algorithm 
on a deceptive class of (discrete) bit-string optimization problems called 
'royal road' functions [3]. The RMHC was compared to two other hill 
climbing algorithms in addition to the genetic algorithm, specifically: the 
Steep est- Ascent Hill Climber, and the Next- Ascent Hill Climber. This study 
was then followed up by Mitchell and Holland [5]. 

Jules and Wattenberg were also early to consider stochastic hill climbing 
as an approach to compare to the genetic algorithm [4]. Skalak applied the 
RMHC algorithm to a single long bit-string that represented a number of 
prototype vectors for use in classification [8]. 
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Learn More 

The Stochastic HiU Chmbing algorithm is related to the genetic algorithm 
without crossover. Simplified version's of the approach are investigated for 
bit-string based optimization problems with the population size of the genetic 
algorithm reduced to one. The general technique has been investigated 
under the names Iterated Hillclimbing [6], ES(l+l,m,hc) [7], Random Bit 
Climber [2], and (1 + 1)-Genetic Algorithm [1]. This main difference between 
RMHC and ES(1+1) is that the latter uses a fixed probability of a mutation 
for each discrete element of a solution (meaning the neighborhood size is 
probabilistic), whereas RMHC will only stochastically modify one element. 
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2.5 Iterated Local Search 

Iterated Local Search, ILS. 

2.5.1 Taxonomy 

Iterated Local Search is a Metaheuristic and a Global Optimization tech- 
nique. It is an extension of Mutli Start Search and may be considered a 
parent of many two-phase search approaches such as the Greedy Random- 
ized Adaptive Search Procedure (Section 2.8) and Variable Neighborhood 
Search (Section 2.7). 

2.5.2 Strategy 

The objective of Iterated Local Search is to improve upon stochastic Mutli- 
Restart Search by sampling in the broader neighborhood of candidate 
solutions and using a Local Search technique to refine solutions to their 
local optima. Iterated Local Search explores a sequence of solutions created 
as perturbations of the current best solution, the result of which is refined 
using an embedded heuristic. 

2.5.3 Procedure 

Algorithm 2.5.1 provides a pseudocode listing of the Iterated Local Search 
algorithm for minimizing a cost function. 



Algorithm 2.5.1: Pseudocode for Iterated Local Search. 
Input: 

Output: Sbest 

1 Sbest ^ ConstructlnitialSolutionO ; 

2 Sbest ^ LocalSearchO ; 

3 Search History 4- tS^est; 

4 while -1 StopConditionO do 

5 Scandidate ^ Perturbat ioH (;Sfoest ; Search History); 

6 Scandidate ^ LocalSear ch ((^candidate ) j 

7 Search History ^ Scandidate] 

8 if AcceptanceCriterion(5test, Scandidate, Search History) then 

9 I ^best ^ S candidate: 

10 end 

11 end 

12 return Sbest] 
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2.5.4 Heuristics 

• Iterated Local Search was designed for and has been predominately 
applied to discrete domains, such as combinatorial optimization prob- 
lems. 

• The perturbation of the current best solution should be in a neighbor- 
hood beyond the reach of the embedded heuristic and should not be 
easily undone. 

• Perturbations that are too small make the algorithm too greedy, 
perturbations that are too large make the algorithm too stochastic. 

• The embedded heuristic is most commonly a problem-specific local 
search technique. 

• The starting point for the search may be a randomly constructed 
candidate solution, or constructed using a problem-specific heuristic 
(such as nearest neighbor). 

• Perturbations can be made deterministically, although stochastic and 
probabilistic (adaptive based on history) are the most common. 

• The procedure may store as much or as little history as needed to 
be used during perturbation and acceptance criteria. No history 
represents a random walk in a larger neighborhood of the best solution 
and is the most common implementation of the approach. 

• The simplest and most common acceptance criteria is an improvement 
in the cost of constructed candidate solutions. 

2.5.5 Code Listing 

Listing 2.4 provides an example of the Iterated Local Search algorithm 
implemented in the Ruby Programming Language. The algorithm is applied 
to the Berlin52 instance of the Traveling Salesman Problem (TSP), taken 
from the TSPLIB. The problem seeks a permutation of the order to visit 
cities (called a tour) that minimizes the total distance traveled. The optimal 
tour distance for Berlin52 instance is 7542 units. 

The Iterated Local Search runs for a fixed number of iterations. The 
implementation is based on a common algorithm configuration for the TSP, 
where a 'double-bridge move' (4-opt) is used as the perturbation technique, 
and a stochastic 2-opt is used as the embedded Local Search heuristic. 
The double-bridge move involves partitioning a permutation into 4 pieces 
(a,b,c,d) and putting it back together in a specific and jumbled ordering 
(a,d,c,b). 
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def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (permutation, cities) 
distance =0 

permutation. each.vith.index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 
distance += euc_2d(cities [cl] , cities [c2]) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array . new(cities . size) { | i | i} 
perm. each_ index do |i| 

r = rand(perm. size-i) + i 

perm[r] , perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def stochastic_two_opt (permutation) 
perm = Array. new (permutation) 
cl, c2 = rand (perm. size) , rand (perm. size) 

exclude = [cl] 

exclude « ((cl==0) ? perm. size-1 : cl-1) 
exclude « ((cl==perm. size-1) ? 0 : cl+1) 
c2 = raind (perm. size) vhile exclude, include? (c2) 
cl, c2 = c2, cl if c2 < cl 
perm[cl . . . c2] = perm [cl ... c2] . reverse 
return perm 
end 

def local_seeirch(best , cities, max_no_improv) 
count = 0 
begin 

candidate = {: vector=>stochastic_two_opt (best [: vector] )} 
candidate [: cost] = cost (ccuididate [: vector] , cities) 
count = (candidate [: cost] < best [: cost]) ? 0 : count+1 
best = candidate if candidate [: cost] < best [: cost] 

end until count >= max_no_improv 

return best 
end 

def double_bridge_move(perm) 

posl = 1 + rand (perm. size / 4) 

pos2 = posl + 1 + rand (perm. size / 4) 

pos3 = pos2 + 1 + rand(perm. size / 4) 

pi = perm [0. . .posl] + perm [pos3. .perm. size] 

p2 = perm[pos2. . .pos3] + perm [posl .. .pos2] 

return pi + p2 
end 
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candidate = {} 

candidate [: vector] = double_bridge_move (best [: vector] ) 
candidate [: cost] = cost (candidate [: vector] , cities) 
return candidate 
end 

def search(cities , inax_iterations , max_no_improv) 
best = {> 

best [: vector] = random_permutation(cities) 
best[:cost] = cost (best [: vector] , cities) 
best = local_search (best , cities, max_no_improv) 
max_iterations. times do |iter| 

candidate = perturbation(cities, best) 

candidate = local_search (candidate , cities, niax_no_improv) 
best = candidate if candidate [: cost] < best [: cost] 
puts " > iteration #{ (iter+l)>, best=#{best [ : cost] } " 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 , 575] , [25 , 185] , [345 , 750] , [945 , 685] , [845 , 655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iterations = 100 
max_no_improv = 50 

# execute the algorithm 

best = search(berlin52 , max_iterations , max_no_improv) 

puts "Done. Best Solution: c=#-[best [ : cost] ]• , v=#{best [: vector] . inspect} " 
end 

Listing 2.4: Iterated Local Search in Ruby 

2.5.6 References 

Primary Sources 

The definition and fi^amework for Iterated Local Search was described by 
Stiitzle in his PhD dissertation [12]. Specifically he proposed constrains on 
what constitutes an Iterated Local Search algorithm as 1) a single chain of 
candidate solutions, and 2) the method used to improve candidate solutions 
occurs within a reduced space by a black-box heuristic. Stiitzle does not take 
credit for the approach, instead highlighting specific instances of Iterated 
Local Search from the literature, such as 'iterated descent' [1], 'large-step 
Markov chains' [7], 'iterated Lin-Kernighan' [3], 'chained local optimization' 
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[6], as well as [2] that introduces the principle, and [4] that summarized it 
(list taken from [8]). 

Learn More 

Two early technical reports by Stiitzle that present applications of Iterated 
Local Search include a report on the Quadratic Assignment Problem [10], 
and another on the permutation flow shop problem [9]. Stiitzle and Hoos 
also published an early paper studying Iterated Local Search for to the TSP 
[11]. Lourenco, Martin, and Stiitzle provide a concise presentation of the 
technique, related techniques and the framework, much as it is presented in 
Stiitzle 's dissertation [5]. The same author's also preset an authoritative 
summary of the approach and its applications as a book chapter [8] . 
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2.6 Guided Local Search 

Guided Local Search, GLS. 

2.6.1 Taxonomy 

The Guided Local Search algorithm is a Metaheuristic and a Global Op- 
timization algorithm that makes use of an embedded Local Search algo- 
rithm. It is an extension to Local Search algorithms such as Hill Climbing 
(Section 2.4) and is similar in strategy to the Tabu Search algorithm (Sec- 
tion 2.10) and the Iterated Local Search algorithm (Section 2.5). 

2.6.2 Strategy 

The strategy for the Guided Local Search algorithm is to use penalties to 
encourage a Local Search technique to escape local optima and discover the 
global optima. A Local Search algorithm is run until it gets stuck in a local 
optima. The features from the local optima are evaluated and penalized, 
the results of which are used in an augmented cost function employed by the 
Local Search procedure. The Local Search is repeated a number of times 
using the last local optima discovered and the augmented cost function that 
guides exploration away from solutions with features present in discovered 
local optima. 

2.6.3 Procedure 

Algorithm 2.6.1 provides a pseudocode listing of the Guided Local Search 
algorithm for minimization. The Local Search algorithm used by the 
Guided Local Search algorithm uses an augmented cost function in the form 
h{s) = g{s) + A • ^fLi fii where h{s) is the augmented cost function, g{s) is 
the problem cost function, A is the 'regularization parameter' (a coefficient 
for scaling the penalties), s is a locally optimal solution of M features, 
and fi is the i'th feature in locally optimal solution. The augmented cost 
function is only used by the local search procedure, the Guided Local Search 
algorithm uses the problem specific cost function without augmentation. 

Penalties are only updated for those features in a locally optimal solution 
that maximize utility, updated by adding 1 to the penalty for the future 
(a counter). The utility for a feature is calculated as U feature — lip""*"'''' , 
where U feature is the utility for penalizing a feature (maximizing), C feature 
is the cost of the feature, and P feature is the current penalty for the feature. 

2.6.4 Heuristics 

• The Guided Local Search procedure is independent of the Local 
Search procedure embedded within it. A suitable domain-specific 



CiJ!. 



50 



Chapter 2. Stochastic Algorithms 



Algorithm 2.6.1: Pseudocode for Guided Local Search. 
Input: Iter max, A 

Output: Shest 

1 f penalties 0; 

2 Shest ^ RandomSolutionO ; 

3 foreach Iteri G Itermax do 

4 ^curr ^ LocalSearch((S'^gg^, A, /penalties'^') 

5 futilities ^ CalculateFeatureUtilities (S'c^irr, f penalties)] 

6 f penalties ^ UpdateFeaturePenalt ies (S'curr ; f penalties, 
futilities ) j 

7 if Cost (S'curr) < Cost(S'best) then 

® I ^hest ^curri 

9 end 

10 end 

11 return Shest] 



search procedure should be identified and employed. 

• The Guided Local Search procedure may need to be executed for 
thousands to hundreds-of-thousands of iterations, each iteration of 
which assumes a run of a Local Search algorithm to convergence. 

• The algorithm was designed for discrete optimization problems where 
a solution is comprised of independently assessable 'features' such as 
Combinatorial Optimization, although it has been applied to continu- 
ous function optimization modeled as binary strings. 

• The A parameter is a scaling factor for feature penalization that must 
be in the same proportion to the candidate solution costs from the 
specific problem instance to which the algorithm is being applied. 
As such, the value for A must be meaningful when used within the 
augmented cost function (such as when it is added to a candidate 
solution cost in minimization and subtracted from a cost in the case 
of a maximization problem). 

2.6.5 Code Listing 

Listing 2.5 provides an example of the Guided Local Search algorithm 
implemented in the Ruby Programming Language. The algorithm is applied 
to the Berlin52 instance of the Traveling Salesman Problem (TSP), taken 
from the TSPLIB. The problem seeks a permutation of the order to visit 
cities (called a tour) that minimizes the total distance traveled. The optimal 
tour distance for Berlin52 instance is 7542 units. 
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The implementation of the algorithm for the TSP was based on the 
configuration specified by Voudouris in [7]. A TSP-specific local search 
algorithm is used called 2-opt that selects two points in a permutation and 
reconnects the tour, potentially untwisting the tour at the selected points. 
The stopping condition for 2-opt was configured to be a fixed number of 
non-improving moves. 

The equation for setting A for TSP instances is A = a • E^fM^^iZZi^^ 
where N is the number of cities, cost{optima) is the cost of a local optimum 
found by a local search, and a G (0, 1] (around 0.3 for TSP and 2-opt). 
The cost of a local optima was fixed to the approximated value of 15000 
for the Berlin52 instance. The utility function for features (edges) in the 

TSP is Uedge = jip^^ 5 where U^dge is the utility for penalizing an edge 
(maximizing), D^dge is the cost of the edge (distance between cities) and 
Pedge is the currcut penalty for the edge. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def random_permutation(cities) 

perm = Array . new(cities . size) { | i | ij 
perm. each_index do |i| 

r = randCperm. size-i) + i 

perm[r], perm[i] = perm[i], perm[r] 
end 

return perm 
end 

def stochastic_two_opt (permutation) 
perm = Array. new(permutation) 
cl, c2 = randCperm. size) , rand(perm. size) 
exclude = [cl] 

exclude << ((cl==0) ? perm.size-1 : cl-1) 
exclude << ( (cl==perm. size-1) ? 0 : cl+1) 
c2 = randCperm. size) while exclude . include? Cc2) 
cl, c2 = c2, cl if c2 < cl 
perm [cl . . . c2] = perm [cl ... c2] . reverse 
return perm 
end 

def augmented_cost Cpermutation, penalties, cities, lambda) 
distance, augmented =0, 0 
permutation. each_with_index do |cl, i| 

c2 = Ci==permutation. size-1) ? permutation [0] : permutation [i+1] 

cl, c2 = c2, cl if c2 < cl 

d = euc_2dCcities [cl] , cities[c2]) 

distance += d 

augmented += d + Clambda * Cpenalties [cl] [c2] ) ) 
end 

return [distance, augmented] 
end 

def costCcand, penalties, cities, lambda) 
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cost, acost = augmented.cost (cand[: vector] , penalties, cities, lambda) 
cand[:cost], cand[: aug_cost] = cost, acost 
end 

def local_search (current , cities, penalties, max_no_improv, leuabda) 
cost (current , penalties, cities, lambda) 
count = 0 
begin 

candidate = {:vector=> stochastic_two_opt (current [: vector] )} 

cost (candidate , penalties, cities, lambda) 

count = (candidate [: aug_cost] < current [: aug_cost] ) ? 0 : count+1 
current = candidate if candidate C:aug_cost] < current [:aug_co5t] 
end until count >= max_no_improv 
return current 
end 

def calculate_feature_utilities (penal , cities, permutation) 
utilities = Array. new(permutation. size, 0) 
permutation. each_vith_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation[0] : permutation [i+1] 

cl, c2 = c2, cl if c2 < cl 

utilities [i] = euc_2d(cities [cl] , cities [c2]) / (1.0 + penal [cl] [c2] ) 
end 

return utilities 
end 

def update_penalties ! (penalties, cities, permutation, utilities) 

max = utilities. max() 

permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

cl, c2 = c2, cl if c2 < cl 

penalties [cl] [c2] += 1 if utilities [i] == max 
end 

return penalties 
end 

def search(max_iterations , cities, max_no_improv, lambda) 
current = {: vector=>random_permutat ion (cities) )• 

best = nil 

penalties = Array. new(cities.size)-[ Array. new(cities. size, 0) } 

max_iterations. times do I iter I 

current=local_search(current , cities, penalties, max_no_improv, lambda) 
utilities=calculate_f eature_utilities (penalties , cities , current [ : vector] ) 
update_penalties ! (penalties, cities, current [: vector] , utilities) 
best = current if best. nil? or current [: cost] < best [: cost] 
puts " > iter=#{(iter+l)]-, best=#{best [: cost] }, aug=#{best [: aug_cost] }" 

end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [[565,575] , [25,185] , [345,750] , [945,685] , [845,655] , 

[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
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[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iterations = 150 
max_no_improv = 20 
alpha = 0.3 

local_search_optima = 12000.0 

lambda = alpha * (local_search_optima/berlin52 . size . to_f ) 

# execute the algorithm 

best = search (max_iterations , berlin52, max_no_improv, lambda) 
puts "Done. Best Solution: c=#-[best [ : cost] }• , v=#-[best [: vector] . inspect}- " 
end 

Listing 2.5: Guided Local Search in Ruby 



2.6.6 References 

Primary Sources 

Guided Local Search emerged from an approach called GENET, which is 
a connectionist approach to constraint satisfaction [6, 13]. Guided Local 
Search was presented by Voudouris and Tsang in a series of technical re- 
ports (that were later published) that described the technique and provided 
example applications of it to constraint satisfaction [8] , combinatorial opti- 
mization [5, 10], and function optimization [9]. The seminal work on the 
technique was Voudouris' PhD dissertation [7]. 



Learn More 

Voudouris and Tsang provide a high-level introduction to the technique [11], 
and a contemporary summary of the approach in Glover and Kochenberger's 
'Handbook of metaheuristics' [12] that includes a review of the technique, 
application areas, and demonstration applications on a diverse set of problem 
instances. Mills et al. elaborated on the approach, devising an 'Extended 
Guided Local Search' (EGLS) technique that added 'aspiration criteria' and 
random moves to the procedure [4], work which culminated in Mills' PhD 
dissertation [3] . Lau and Tsang further extended the approach by integrating 
it with a Genetic Algorithm, called the 'Guided Genetic Algorithm' (GGA) 
[2], that also culminated in a PhD dissertation by Lau [1]. 
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2.7.1 Taxonomy 

Variable Neighborhood Search is a Metaheuristic and a Global Optimization 
technique that manages a Local Search technique. It is related to the 
Iterative Local Search algorithm (Section 2.5). 

2.7.2 Strategy 

The strategy for the Variable Neighborhood Search involves iterative ex- 
ploration of larger and larger neighborhoods for a given local optima until 
an improvement is located after which time the search across expanding 
neighborhoods is repeated. The strategy is motivated by three principles: 
1) a local minimum for one neighborhood structure may not be a local 
minimum for a different neighborhood structure, 2) a global minimum is a 
local minimum for all possible neighborhood structures, and 3) local minima 
are relatively close to global minima for many problem classes. 

2.7.3 Procedure 

Algorithm 2.7.1 provides a pseudocode listing of the Variable Neighborhood 
Search algorithm for minimizing a cost function. The Pseudocode shows 
that the systematic search of expanding neighborhoods for a local optimum 
is abandoned when a global improvement is achieved (shown with the Break 
jump). 

2.7.4 Heuristics 

• Approximation methods (such as stochastic hill climbing) are suggested 
for use as the Local Search procedure for large problem instances in 
order to reduce the running time. 

• Variable Neighborhood Search has been applied to a very wide array 
of combinatorial optimization problems as well as clustering and 
continuous function optimization problems. 

• The embedded Local Search technique should be specialized to the 
problem type and instance to which the technique is being applied. 

• The Variable Neighborhood Descent (VND) can be embedded in the 
Variable Neighborhood Search as a the Local Search procedure and 
has been shown to be most effective. 
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Algorithm 2.7.1: Pseudocode for VNS. 



Input: Neighborhoods 

Output: Sbest 

1 Sbest ^ RandomSolutionO ; 

2 while -1 StopConditionO do 

3 foreach Neighborhoodi G Neighborhoods do 

4 Neighborhoodcurr ^ CalculateNeighborhoodCS'teg^, 
N eighborhoodi) ; 

Scandidate 

RandomSolutionInNeighborhood(A/^ei^/i6or/ioo(icMrr ) ; 

6 Scandidate LocalSear ch (S'canfiidaie ) ! 

7 if Cost {Scandidate^ < CostiStest^ then 

8 ^best ^candidate) 

9 Break; 

10 end 

11 end 

12 end 

13 return Sbest] 



2.7.5 Code Listing 

Listing 2.6 provides an example of the Variable Neighborhood Search algo- 
rithm implemented in the Ruby Programming Language. The algorithm is 
applied to the Berlin52 instance of the Traveling Salesman Problem (TSP), 
taken from the TSPLIB. The problem seeks a permutation of the order to 
visit cities (called a tour) that minimizes the total distance traveled. The 
optimal tour distance for Berlin52 instance is 7542 units. 

The Variable Neighborhood Search uses a stochastic 2-opt procedure as 
the embedded local search. The procedure deletes two edges and reverses 
the sequence in-between the deleted edges, potentially removing 'twists' in 
the tour. The neighborhood structure used in the search is the number of 
times the 2-opt procedure is performed on a permutation, between 1 and 20 
times. The stopping condition for the local search procedure is a maximum 
number of iterations without improvement. The same stop condition is 
employed by the higher-order Variable Neighborhood Search procedure, 
although with a lower boundary on the number of non-improving iterations. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (perm, cities) 
distance =0 

perm. each_with_index do |cl, i| 

c2 = (i==perm. size-1) ? perm[0] : perm[i+l] 
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distance += euc_2d(cities [cl] , cities [c2]) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array .new(cities . size) -[ | i | i} 
perm. each_ index do |i| 

r = randCperm. size-i) + i 

perm[r] , perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def stochastic_two_opt ! (perm) 

cl, c2 = randCperm. size) , randCperm. size) 

exclude = [cl] 

exclude << ((cl==0) ? perm.size-1 : cl-1) 
exclude « C Ccl==perm. size-1) ? 0 : cl+1) 
c2 = reindCperm. size) vhile exclude, include? Cc2) 
cl, c2 = c2, cl if c2 < cl 
perm[cl. . .c2] = perm[cl. . . c2] .reverse 
return perm 
end 

def local_search(best , cities, max_no_improv, neighborhood) 
count = 0 
begin 

candidate = {} 

candidate [: vector] = Array. new (best [: vector] ) 
neighborhood. times-Cstochastic_two_opt ! Ccandidate [: vector] )} 

candidate [: cost] = cost (candidate [: vector] , cities) 
if CEindidate [: cost] < best [: cost] 

count, best = 0, candidate 
else 

count += 1 
end 

end until count >= max_no_improv 
return best 
end 

def searchCcities, neighborhoods, max_no_improv, max_no_improv_ls) 

best = {} 

best [: vector] = reindom.permutation (cities) 
best[:cost] = cost (best [: vector] , cities) 
iter, count =0, 0 
begin 

neighborhoods . each do | neigh | 
candidate = {} 

candidate [: vector] = Array . new (best [: vector] ) 

neigh. times{stochastic_two_opt ! (candidate [ : vector] ) } 

candidate [: cost] = cost (candidate [: vector] , cities) 

candidate = local_search(candidate, cities, max_no_improv_ls, neigh) 

puts " > iteration #{(iter+l)}, neigh=#-Cneigh}, best=#{best[:cost]>" 

iter += 1 

if Ccandidate [: cost] < best [: cost]) 
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best, count = candidate, 0 

puts "New best, restarting neighborhood search." 

break 
else 

count += 1 
end 
end 

end imtil count >= max_no_iinprov 
return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 , 575] , [25 , 185] , [345 , 750] , [945 , 685] , [845 , 655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150, 1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_no_improv = 50 
inax_no_improv_ls = 70 
neighborhoods = 1 . . . 20 

# execute the algorithm 

best = search(berlin52 , neighborhoods, max_no_improv, max_no_improv_ls) 
puts "Done. Best Solution: c=#-[best [ : cost] }• , v=#-[best [: vector] . inspect} " 
end 

Listing 2.6: Variable Neighborhood Search in Ruby 

2.7.6 References 

Primary Sources 

The seminal paper for describing Variable Neighborhood Search was by 
Mladenovic and Hansen in 1997 [7], although an early abstract by Mladenovic 
is sometimes cited [6] . The approach is explained in terms of three different 
variations on the general theme. Variable Neighborhood Descent (VND) 
refers to the use of a Local Search procedure and the deterministic (as 
opposed to stochastic or probabilistic) change of neighborhood size. Reduced 
Variable Neighborhood Search (RVNS) involves performing a stochastic 
random search within a neighborhood and no refinement via a local search 
technique. Basic Variable Neighborhood Search is the canonical approach 
described by Mladenovic and Hansen in the seminal paper. 

Learn More 

There are a large number of papers published on Variable Neighborhood 
Search, its applications and variations. Hansen and Mladenovic provide an 
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overview of the approach that includes its recent history, extensions and a 
detailed review of the numerous areas of application [4] . For some additional 
useful overviews of the technique, its principles, and applications, see [1-3]. 

There are many extensions to Variable Neighborhood Search. Some 
popular examples include: Variable Neighborhood Decomposition Search 
(VNDS) that involves embedding a second heuristic or metaheuristic ap- 
proach in VNS to replace the Local Search procedure [5], Skewed Variable 
Neighborhood Search (SVNS) that encourages exploration of neighborhoods 
far away from discovered local optima, and Parallel Variable Neighborhood 
Search (PVNS) that either parallelizes the local search of a neighborhood 
or parallelizes the searching of the neighborhoods themselves. 
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[3] P. Hansen and N. Mladenovic. Handbook of Applied Optimization, chap- 
ter Variable neighbourhood search, pages 221-234. Oxford University 
Press, 2002. 

[4] P. Hansen and N. Mladenovic. Handbook of Metaheuristics, chapter 6: 
Variable Neighborhood Search, pages 145-184. Springer, 2003. 

[5] P. Hansen, N. Mladenovic, and D. Perez-Britos. Variable neighborhood 
decomposition search. Journal of Heuristics, 7(4):1381-1231, 2001. 
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2.8 Greedy Randomized Adaptive Search 

Greedy Randomized Adaptive Search Procedure, GRASP. 

2.8.1 Taxonomy 

The Greedy Randomized Adaptive Search Procedure is a Metaheuristic 
and Global Optimization algorithm, originally proposed for the Operations 
Research practitioners. The iterative application of an embedded Local 
Search technique relate the approach to Iterative Local Search (Section 2.5) 
and Multi-Start techniques. 

2.8.2 Strategy 

The objective of the Greedy Randomized Adaptive Search Procedure is to 
repeatedly sample stochastically greedy solutions, and then use a local search 
procedure to refine them to a local optima. The strategy of the procedure 
is centered on the stochastic and greedy step- wise construction mechanism 
that constrains the selection and order-of-inclusion of the components of a 
solution based on the value they are expected to provide. 

2.8.3 Procedure 

Algorithm 2.8.1 provides a pseudocode listing of the Greedy Randomized 
Adaptive Search Procedure for minimizing a cost function. 

Algorithm 2.8.1: Pseudocode for the GRASP. 
Input: a 

Output: Sbest 

1 Sbest ^ ConstructRandomSolutionO ; 

2 while -1 StopConditionO do 

3 Scandidate ^ GreedyRandomizedConstruction(a) ; 

4 ^candidate LocalSearch (S'^andidate ) j 

5 if Cost (Scandidate) < Cost (Sbest^ then 

6 I Sbest ^ S candidate 1 

7 end 

8 end 

9 return Sbest] 



Algorithm 2.8.2 provides the pseudocode the Greedy Randomized Con- 
struction function. The function involves the step- wise construction of a 
candidate solution using a stochastically greedy construction process. The 
function works by building a Restricted Candidate List (RCL) that con- 
straints the components of a solution (features) that may be selected from 
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each cycle. The RCL may be constrained by an exphcit size, or by using 
a threshold (a G [0, 1]) on the cost of adding each feature to the current 
candidate solution. 



Algorithm 2.8.2: Pseudocode the GreedyRandomizedConstruction 

function. 

Input: a 

Output: Scandidate 

1 S candidate 

2 while Scandidate 7^ ProblemSize do 

3 Featurccosts ^ 0; 

4 for Featurei ^ Scandidate do 

5 Featurecosts ^ 
CostOfAddingFeatureToSolut ion (^candidafe; Featurei) ; 

end 

RCL ^ 0; 

Fcostmin ^ MinCost (FeatMrCcosis ) J 
Fcostmax ^ MaxCost (Fea^wrCcosts ) ; 
for Ficost G FeaturCcosts do 

if FiCOSt < Fcostmin + « • {Fcostmax - FcOStmin) then 



6 
7 
8 
9 
10 

11 

12 
13 
14 
15 



RCL ^ Featurei] 



end 
end 

Scandidate ^ SelectRandomFeature ( RC L ) ; 

16 end 

17 return Scand idate i 



2.8.4 Heuristics 

• The a threshold defines the amount of greediness of the construction 
mechanism, where values close to 0 may be too greedy, and values 
close to 1 may be too generalized. 

• As an alternative to using the a threshold, the RCL can be constrained 
to the top n% of candidate features that may be selected from each 
construction cycle. 

• The technique was designed for discrete problem classes such as com- 
binatorial optimization problems. 

2.8.5 Code Listing 

Listing 2.7 provides an example of the Greedy Randomized Adaptive Search 
Procedure implemented in the Ruby Programming Language. The algorithm 



Copyrighted material 



62 



Chapter 2. Stochastic Algorithms 



is applied to the Berlin52 instance of the Traveling Salesman Problem (TSP), 
taken from the TSPLIB. The problem seeks a permutation of the order to 
visit cities (called a tour) that minimizes the total distance traveled. The 
optimal tour distance for Berlin52 instance is 7542 units. 

The stochastic and greedy step-wise construction of a tour involves 
evaluating candidate cities by the the cost they contriliute as being the 
next city in the tour. The algorithm uses a stochastic 2-opt procedure for 
the Local Search with a fixed number of non-improving iterations as the 
stopping condition. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] )**2 . 0) .round 
end 

def cost (perm, cities) 
distance =0 

perm. each_with_index do |cl, i| 

c2 = (i==perm. size-1) ? perin[0] : perin[i+l] 

distEince += euc_2d(cities [cl] , cities [c2]) 
end 

return distance 
end 

def stochastic_two_opt (permutation) 
perm = Array .new (permutation) 
cl, c2 = rand(perm. size) , rand (perm. size) 
exclude = [cl] 

exclude << ((cl==0) ? perm. size-1 : cl-1) 
exclude << ( (cl==perm. size-1) ? 0 : cl+1) 
c2 = rand (perm. size) while exclude. include? (c2) 
cl, c2 = c2, cl if c2 < cl 
perm[cl. . .c2] = perm[cl. . .c2] .reverse 
return perm 
end 

def local_search(best , cities, max_no_improv) 
count = 0 
begin 

candidate = -[: vector=>stochastic_two_opt (best [: vector] )} 

candidate [: cost] = cost (ceindidate [: vector] , cities) 

count = (candidate [: cost] < best [: cost]) ? 0 : count+1 

best = candidate if candidate [: cost] < best [: cost] 
end until count >= max_no_improv 
return best 
end 

def construct_randomized_greedy_solution(cities, alpha) 
candidate = {} 

candidate [: vector] = [rand(cities . size) ] 
allCities = Array . new(cities . size) { I i I i]- 
while candidate [: vector] . size < cities. size 

candidates = allCities - candidate [: vector] 

costs = Array . new(candidates . size) do |i| 

euc_2d(cities [candidate [: vector] .last] , cities[i]) 



2.8. Greedy Randomized Adaptive Search 



63 



end 

rcl, max, min = [] , costs. max, costs. min 
costs . each_with_index do |c,i| 

rcl << candidates [i] if c <= (min + alpha* (max-min) ) 
end 

candidate [: vector] << rcl [rand(rcl . size) ] 
end 

candidate [: cost] = cost (candidate [: vector] , cities) 
return candidate 
end 

def search(cities, max_iter, max_no_improv, alpha) 
best = nil 

max_iter . times do |iter| 

candidate = construct_randomized_greedy_solution(cities , alpha); 

candidate = local_search(candidate , cities, max_no_improv) 

best = candidate if best. nil? or candidate [: cost] < best [: cost] 

puts " > iteration #{(iter+l)}, best=#{best [ : cost] > " 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 , 575] , [25 , 185] , [345 ,750] , [945, 685] , [845 , 655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iter = 50 
max_no_improv = 50 
greediness_f actor = 0.3 

# execute the algorithm 

best = search (berlin52, max_iter, max_no_improv, greediness_f actor) 
puts "Done. Best Solution: c=#-[best [ : cost] ]• , v=#-[best [: vector] . inspect]-" 
end 

Listing 2.7: Greedy Randomized Adaptive Search Procedure in Ruby 



2.8.6 References 

Primary Sources 

The seminal paper that introduces the general approach of stochastic and 
greedy step-wise construction of candidate solutions is by Feo and Resende 
[3]. The general approach was inspired by greedy heuristics by Hart and 
Shogan [9]. The seminal review paper that is cited with the preliminary 
paper is by Feo and Resende [4], and provides a coherent description 
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of the GRASP technique, an example, and review of early applications. 
An early application was by Feo, Venkatraman and Bard for a machine 
scheduling problem [7]. Other early applications to scheduling problems 
include technical reports [2] (later published as [1]) and [5] (also later 
published as [6]). 

Learn More 

There are a vast number of review, application, and extension papers for 
GRASP. Pitsoulis and Resende provide an extensive contemporary overview 
of the field as a review chapter [11], as does Resende and Ribeiro that 
includes a clear presentation of the use of the a threshold parameter instead 
of a fixed size for the RCL [13]. Festa and Resende provide an annotated 
bibliography as a review chapter that provides some needed insight into large 
amount of study that has gone into the approach [8]. There are numerous 
extensions to GRASP, not limited to the popular Reactive GRASP for 
adapting a [12], the use of long term memory to allow the technique to 
learn from candidate solutions discovered in previous iterations, and parallel 
implementations of the procedure such as 'Parallel GRASP' [10]. 
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2.9.1 Taxonomy 

Scatter search is a Metaheuristic and a Global Optimization algorithm. It is 
also sometimes associated with the field of Evolutionary Computation given 
the use of a population and recombination in the structure of the technique. 
Scatter Search is a sibling of Tabu Search (Section 2.10), developed by the 
same author and based on similar origins. 

2.9.2 Strategy 

The objective of Scatter Search is to maintain a set of diverse and high- 
quality candidate solutions. The principle of the approach is that useful 
information about the global optima is stored in a diverse and elite set of 
solutions (the reference set) and that recombining samples from the set 
can exploit this information. The strategy involves an iterative process, 
where a population of diverse and high-quality candidate solutions that 
are partitioned into subsets and linearly recombined to create weighted 
centroids of sample-based neighborhoods. The results of recombination 
are refined using an embedded heuristic and assessed in the context of the 
reference set as to whether or not they are retained. 

2.9.3 Procedure 

Algorithm 2.9.1 provides a pseudocode listing of the Scatter Search algorithm 
for minimizing a cost function. The procedure is based on the abstract form 
presented by Glover as a template for the general class of technique [3] , with 
infiuences from an application of the technique to function optimization by 
Glover [3]. 

2.9.4 Heuristics 

• Scatter search is suitable for both discrete domains such as combina- 
torial optimization as well as continuous domains such as non-linear 
programming (continuous function optimization). 

• Small set sizes are preferred for the Ref erenceSet, such as 10 or 20 
members. 

• Subset sizes can be 2, 3, 4 or more members that are all recombined 
to produce viable candidate solutions within the neighborhood of the 
members of the subset. 
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Algorithm 2.9.1: Pseudocode for Scatter Search. 

Input: DiverseSetsizei References etgize 
Output: ReferenceSet 

1 InitialSet <r- ConstructInitialSolution(Dit;erseS'eis^;2e) ; 

2 RefinedSet ^ 0; 

3 for Si G InitialSet do 

4 I RefinedSet ^ LocalSearch(5'^) ; 

5 end 

6 ReferenceSet ^ SelectlnitialRef erenceSet (i^e/erenceS'eisi^e) ; 

7 while -1 StopConditionO do 

8 Subsets ^ SelectSubset (ReferenceSet) ; 

9 CandidateSet ^ 0; 

10 for Subseti G Subsets do 

11 RecombinedCandidates <— RecombineMembers (S'wfeseii) ; 

12 for 5*^ G RecombinedCandidates do 

13 I CandidateSet LocalSearchCS'i) ; 

14 end 

15 end 

16 ReferenceSet ^ Select (ReferenceSet, CandidateSet, 

ReferenceSetsizey, 

17 end 

18 return ReferenceSet; 



• Each subset should comprise at least one member added to the set in 
the previous algorithm iteration. 

• The Local Search procedure should be a problem-specific improvement 
heuristic. 

• The selection of members for the ReferenceSet at the end of each 
iteration favors solutions with higher quality and may also promote 
diversity. 

• The ReferenceSet may be updated at the end of an iteration, or 
dynamically as candidates are created (a so-called steady-state popu- 
lation in some evolutionary computation literature). 

• A lack of changes to the ReferenceSet may be used as a signal to 
stop the current search, and potentially restart the search with a newly 
initialized ReferenceSet. 



2.9.5 Code Listing 

Listing 2.8 provides an example of the Scatter Search algorithm implemented 
in the Ruby Programming Language. The example problem is an instance of 
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a continuous function optimization that seeks min/(x) where / = Yl^=i 
—5.0 < Xi < 5.0 and n = 3. The optimal solution for this basin function is 
...,Vn)^ 0.0. 

The algorithm is an implementation of Scatter Search as described in 
an application of the technique to unconstrained non-linear optimization by 
Glover [6]. The seeds for initial solutions are generated as random vectors, 
as opposed to stratified samples. The example was further simplified by 
not including a restart strategy, and the exclusion of diversity maintenance 
in the Ref erenceSet. A stochastic local search algorithm is used as the 
embedded heuristic that uses a stochastic step size in the range of half a 
percent of the search space. 

def objective_fuiiction(vector) 

return vector . inject (0) {|sum, x| sum + (x ** 2.0)> 
end 

def rand_in_bounds (min, max) 

return min + ( (max-min) * randO) 
end 

def random_vector (minmax) 

return Array . new(minmax . size) do |i| 

rand_in_bounds (minmax [i] [0] , minmax [i] [1]) 

end 
end 

def take_step (minmax, current, step_size) 
position = Array . new(current . size) 
position. size . times do |i| 

min = [minmax [i] [0] , current [i] -step_size] .max 

max = [minmax [i] [1] , current [i] +step_size] .min 

position [i] = rand_in_bounds (min, max) 
end 

return position 
end 

def local_search(best , bounds, max_no_improv, step_size) 
count = 0 
begin 

candidate = -(: vector=>take_step (bounds , best [: vector] , step_size)}- 

candidate [ : cost] = obj ective_f unction(candidate [: vector] ) 

count = (candidate [: cost] < best [: cost]) ? 0 : count+1 

best = candidate if candidate [: cost] < best [: cost] 
end until count >= max_no_improv 
return best 
end 

def construct_initial_set (bounds , set_size, max_no_improv, step_size) 
diverse_set = [] 
begin 

cand = {: vector=>random_vector (bounds) } 

cand[:cost] = obj ective_f unction(cand [: vector] ) 

cand = local_search (cand , bounds, max_no_improv, step_size) 

diverse_set << cand if !diverse_set . any? -[|x| x [: vector] ==cand [: vector] } 
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end until diverse_set.size == set_size 
return diverse_set 
end 

def euclidean_distSLnce(cl, c2} 

sum = 0.0 

cl.each_index {|i| sum += (cl [i] -c2 [i] ) **2. 0} 
return Math. sqrt (sum) 
end 

def distance (v, set) 

return set.inject(0){|s,x| s + euclidean_distance(v, x[:vector])} 
end 

def diversify (diverse.set, num_elite, ref _set_size) 
diverse_set.sort !{|x,y| x[:cost] <=> y[:cost])- 
ref_set = Array . new(num_elite) { | i | diverse_set [i] } 
remainder = diverse_set - ref_set 

remainder . each{ I c I c[:dist] = distance (c[: vector] , ref.set)} 
remainder . sort !{ I x,y I y [:dist] <=>x[:dist] } 

ref_set = ref_set + remainder . first (ref _set_size-ref _set . size) 
return [ref_set, ref_set[0]] 
end 

def select_subsets (ref _set) 

additions = ref _set . select-[ | c | c[:new]} 
remainder = ref_set - additions 

remainder = additions if remainder .nil? or remainder . empty? 
subsets = [] 
additions . each do | a | 

remainder . each-[ I r I subsets « [a,r] if a!=r &ft ! subsets. include? ( [r, a] )} 
end 

return subsets 
end 

def recombine (subset , minmax) 
a, b = subset 

d = rand(euclidean_distance(a[: vector] , b[:vector]))/2.0 

children = [] 
subset. each do |p| 

step = (rand<0.5) ? +d : -d 

child = {: vector=> Array. new (minmax. size) } 

child [: vector] . each_index do |i| 

child [: vector] [i] = p[: vector] [i] + step 

child [: vector] [i]=minmax[i] [0] if child[: vector] [i] <minmax[i] [0] 
child [: vector] [i]=minmax[i] [1] if child[: vector] [i] >minm2uc[i] [1] 
end 

child[:cost] = objective_function(child[: vector] ) 
children « child 
end 

return children 
end 

def explore_subsets (bounds, ref_set, max_no_improv, step_size) 
was_change = false 
subsets = select_subsets(ref _set) 
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ref _set . each-[ I c I c[:new] = false} 
subsets.each do | subset | 

candidates = recombine (subset , bounds) 

improved = Array. new(candidates. size) do |i| 
local_search (candidates [i] , bounds, max_no_improv, step.size) 

end 

improved. each do |c| 

if !ref _set . any? {|x| x[:vector]==c[: vector]} 
c[:new] = true 

ref _set . sort K I x,y I x[:cost] <=> y[:cost]}- 
if c[:cost] < ref _set . last [: cost] 
ref _set . delete (ref _set . last) 

ref_set << c 

puts " >> added, cost=#-[c[: cost]}" 
was _ change = true 
end 
end 
end 
end 

return was_change 
end 

def search (bounds, max_iter, ref _set_size, div_set_size, max_no_improv, 
step.size, max_elite) 
diver se_set = construct_initial_set (bounds, div_set_size, max_no_improv, 

step_size) 

ref_set, best = diversify (diverse_set , max_elite, ref _set_size) 

ref _set . each-t I c I c[:new] = true} 
max_iter . times do liter] 

vas_change = explore_subaets (bounds, ref_set, max_no_improv, step_size) 

ref _set.sort !{|x,y| x[:cost] <=> y[:cost]} 

best = ref _set . first if ref _set • first [: cost] < best[:cost] 

puts " > iter=#{(iter+l)}, best=#{best [: cost] }" 

break if !was_change 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 3 

bounds = Array .new(problem_size) {lil [-5, +5]} 

# algorithm configuration 
max_iter = 100 

step_size = (bounds [0] [1] -bounds [0] [0] )*0. 005 

max_no_improv = 30 
ref_set_size = 10 
diverse_set_size = 20 
no_elite = 5 

# execute the algorithm 

best = search (bounds , max_iter, ref _set_size, diverse_set_size, 

max_no_improv, step_size, no_elite) 
puts "Done. Best Solution: c=#{best [: cost] }, v=#{best [: vector] . inspect}" 
end 

Listing 2.8: Scatter Search in Ruby 
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2.9.6 References 

Primary Sources 

A form of the Scatter Search algorithm was proposed by Glover for integer 
programming [1], based on Glover's earlier work on surrogate constraints. 
The approach remained idle mitil it was revisited by Glover and combined 
with Tabu Search [2] . The modern canonical reference of the approach was 
proposed by Glover who provides an abstract template of the procedure 
that may be specialized for a given application domain [3]. 

Learn More 

The primary reference for the approach is the book by Laguna and Marti 
that reviews the principles of the approach in detail and presents tutorials on 
applications of the approach on standard problems using the C programming 
language [7]. There are many review articles and chapters on Scatter Search 
that may be used to supplement an understanding of the approach, such 
as a detailed review chapter by Glover [4], a review of the fundamentals of 
the approach and its relationship to an abstraction called 'path linking' by 
Glover, Laguna, and Marti [5], and a modern overview of the technique by 
Martf, Laguna, and Glover [8]. 
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2.10 Tabu Search 

Tabu Search, TS, Taboo Search. 

2.10.1 Taxonomy 

Tabu Search is a Global Optimization algorithm and a Metaheuristic or 
Meta-strategy for controlling an embedded heuristic technique. Tabu Search 
is a parent for a large family of derivative approaches that introduce memory 
structures in Metaheuristics, such as Reactive Tabu Search (Section 2.11) 
and Parallel Tabu Search. 

2.10.2 Strategy 

The objective for the Tabu Search algorithm is to constrain an embedded 
heuristic from returning to recently visited areas of the search space, referred 
to as cycling. The strategy of the approach is to maintain a short term 
memory of the specific changes of recent moves within the search space 
and preventing future moves from undoing those changes. Additional 
intermediate-term memory structures may be introduced to bias moves 
toward promising areas of the search space, as well as longer-term memory 
structures that promote a general diversity in the search across the search 
space. 

2.10.3 Procedure 

Algorithm 2.10.1 provides a pseudocode listing of the Tabu Search algorithm 
for minimizing a cost function. The listing shows the simple Tabu Search 
algorithm with short term memory, without intermediate and long term 
memory management. 

2.10.4 Heuristics 

• Tabu search was designed to manage an embedded hill climbing 
heuristic, although may be adapted to manage any neighborhood 
exploration heuristic. 

• Tabu search was designed for, and has predominately been applied to 
discrete domains such as combinatorial optimization problems. 

• Candidates for neighboring moves can be generated deterministically 
for the entire neighborhood or the neighborhood can be stochastically 
sampled to a fixed size, trading off efficiency for accuracy. 

• Intermediate-term memory structures can be introduced (complement- 
ing the short-term memory) to focus the search on promising areas of 
the search space (intensification), called aspiration criteria. 
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Algorithm 2.10.1: Pseudocode for Tabu Search. 
Input: TabuListsize 

Output: Sbest 

1 Sbest ^ ConstructlnitialSolutionO ; 

2 TabuList ^ 0; 

3 while -I StopConditionO do 



4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 



CandidateList ^ 0; 

for S candidate ^ Sbcstneighborhood do 

if -1 ContainsAnyFeatures ((Scandidate? TabuList) then 

I CandidateList Scandidate', 
end 
end 

Scandidate ^ LocateBestCandidate (CandidateList) ; 

if Cost (^candidate) < Cost(Sbest) then 

Sbest Scandidate^ 

TabuList ^ FeatureDif f erences(5'candidate, Sbesty, 
while TabuList > TabuListsize do 
I DeleteFeature (TabuList) ; 



end 
end 

18 end 

19 return Sbest] 



• Long-term memory structures can be introduced (complementing the 
short-term memory) to encourage useful exploration of the broader 
search space, called diversification. Strategies may include generating" 
solutions with rarely used components and biasing the generation 
away from the most commonly used solution components. 

2.10.5 Code Listing 

Listing 2.9 provides an example of the Tabu Search algorithm implemented 
in the Ruby Programming Language. The algorithm is applied to the 
Berlin52 instance of the Traveling Salesman Problem (TSP), taken from 
the TSP LIB. The problem seeks a permutation of the order to visit cities 
(called a tour) that minimizes the total distance traveled. The optimal tour 
distance for Berli52 instance is 7542 units. 

The algorithm is an implementation of the simple Tabu Search with a 
short term memory structure that executes for a fixed number of iterations. 
The starting point for the search is prepared using a random permutation 
that is refined using a stochastic 2-opt Local Search procedure. The stochas- 
tic 2-opt procedure is used as the embedded hill climbing heuristic with 
a fixed sized candidate list. The two edges that are deleted in each 2-opt 
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move are stored on the tabu list. This general approach is similar to that 
used by Knox in his work on Tabu Search for symmetrical TSP [12] and 
Fiechter for the Parallel Tabu Search for the TSP [2]. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (perm, cities) 
distance = 0 

perm. each_with_index do |cl, i| 

c2 = (i==perm. size-1) ? perm[0] : perm[i+l] 

distance += euc_2d(cities [cl] , cities [c2]) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array . new(cities . size) -[ | i | i} 
perm. each_index do |i| 

r = randCperm. size-i) + i 

perm[r] , perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def stochastic_two_opt (parent) 
perm = Array. new (parent) 

cl, c2 = rand (perm. size) , rand(perm. size) 
exclude = [cl] 

exclude << ((cl==0) ? perm. size-1 : cl-1) 

exclude << ( (cl==perm. size-1) ? 0 : cl+1) 

c2 = rand (perm. size) while exclude . include? (c2) 

cl, c2 = c2, cl if c2 < cl 

perm [cl . . . c2] = perm [cl ... c2] . reverse 

return perm, [ [parent [cl-1] , parent [cl]], [parent [c2-l] , parent [c2]]] 
end 

def is_tabu? (permutation, tabu_list) 
permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation[i+l] 
tabu_list . each do | f orbidden_edge | 

return true if f orbidden_edge == [cl, c2] 
end 
end 

return false 
end 

def generate_candidate (best , tabu_list, cities) 
perm, edges = nil, nil 
begin 

perm, edges = stochastic_two_opt (best [: vector] ) 
end while is_tabu? (perm, tabu_list) 
candidate = -[ : vector=>perm]- 

candidate [: cost] = cost (candidate [: vector] , cities) 
return candidate, edges 
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end 

def search(cities , tabu_list_size , candidate_list_size , max_iter) 
current = -[ : vector=>random_permutation(cities)}- 
current [: cost] = cost (current [: vector] , cities) 
best = current 

tabu_list = Array . new(tabu_list_size) 
max_iter . times do | iter | 

candidates = Array . new(candidate_list_size) do |i| 
generate_candidate (current , tabu_list, cities) 

end 

candidates . sort ! -C|x,y| x . first [: cost] <=> y . first [: cost] } 
best_candidate = candidates . first [0] 
best_candidate_edges = candidates . f irst [1] 
if best_candidate [ : cost] < current [: cost] 
current = best_candidate 

best = best_candidate if best.candidate [ : cost] < best [: cost] 
best_candidate_edges . each -[ | edge | tabu_list . push(edge) }■ 
tabu_list . pop while tabu_list . size > tabu_list_size 
end 

puts " > iteration #{ (iter+l)>, best=#{best [ : cost] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 , 575] , [25 , 185] , [345 ,750] , [945 , 685] , [845 , 655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iter = 100 
tabu_list_size = 15 
max_candidates = 50 

# execute the algorithm 

best = search(berlin52, tabu_list_size , max_candidates , max_iter) 
puts "Done. Best Solution: c=#-[best [ : cost] ]- , v=#{best [: vector] . inspect}" 
end 

Listing 2.9: Tabu Search in Ruby 



2.10.6 References 
Primary Sources 

Tabu Search was introduced by Glover apphed to scheduhng employees to 
duty rosters [9] and a more general overview in the context of the TSP [5], 
based on his previous work on surrogate constraints on integer programming 
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problems [4]. Glover provided a seminal overview of the algorithm in a 
two-part journal article, the first part of which introduced the algorithm 
and reviewed then-recent applications [6], and the second which focused on 
advanced topics and open areas of research [7]. 



Learn More 

Glover provides a high-level introduction to Tabu Search in the form of a 
practical tutorial [8], as does Glover and Taillard in a user guide format 
[10]. The best source of information for Tabu Search is the book dedicated 
to the approach by Glover and Laguna that covers the principles of the 
technique in detail as well as an in-depth review of applications [11]. The 
approach appeared in Science, that considered a modification for its appli- 
cation to continuous function optimization problems [1]. Finally, Gendreau 
provides an excellent contemporary review of the algorithm, highlighting- 
best practices and application heuristics collected from across the field of 
study [3]. 
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2.11 Reactive Tabu Search 

Reactive Tabu Search, RTS, R-TABU, Reactive Taboo Search. 

2.11.1 Taxonomy 

Reactive Tabu Search is a Metaheuristic and a Global Optimization algo- 
rithm. It is an extension of Tabu Search (Section 2.10) and the basis for a 
field of reactive techniques called Reactive Local Search and more broadly 
the field of Reactive Search Optimization. 

2.11.2 Strategy 

The objective of Tabu Search is to avoid cycles while applying a local search 
technique. The Reactive Tabu Search addresses this objective by explicitly 
monitoring the search and reacting to the occurrence of cycles and their 
repetition by adapting the tabu tenure (tabu list size). The strategy of the 
broader field of Reactive Search Optimization is to automate the process by 
which a practitioner configures a search procedure by monitoring its online 
behavior and to use machine learning techniques to adapt a techniques 
configuration. 

2.11.3 Procedure 

Algorithm 2.11.1 provides a pseudocode listing of the Reactive Tabu Search 
algorithm for minimizing a cost function. The Pseudocode is based on the 
version of the Reactive Tabu Search described by Battiti and Tecchiolli in [9] 
with supplements like the IsTabu function from [7]. The procedure has been 
modified for brevity to exude the diversification procedure (escape move). 
Algorithm 2.11.2 describes the memory based reaction that manipulates 
the size of the Prohibit ionPeriod in response to identified cycles in the 
ongoing search. Algorithm 2.11.3 describes the selection of the best move 
from a list of candidate moves in the neighborhood of a given solution. The 
function permits prohibited moves in the case where a prohibited move is 
better than the best know solution and the selected admissible move (called 
aspiration). Algorithm 2.11.4 determines whether a given neighborhood 
move is tabu based on the current ProhibitionPeriod, and is employed 
by sub- functions of the Algorithm 2.11.3 function. 

2.11.4 Heuristics 

• Reactive Tabu Search is an extension of Tabu Search and as such 
should exploit the best practices used for the parent algorithm. 
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Algoritliiii 2.11.1: PstMiilMcodc for "Ri'arli\-c Talm Sc.-n-cli. 
Input: Iteratiorimaxi Increase, Decrease, ProblemSize 

Output: Shest 

1 Scurr ConstructlnitialSolutionO; 

2 ^best ^curr i 

3 TabuList ^ 0: 

4 ProhibitionPeriod ^ 1; 

5 foreach Iteratioui G Iteratiorimax do 

6 MemoryBasedReactiondncrease, Decrease, ProblemSize); 

7 CandidateList <— GenerateCandidateNeighborhood (^c^rr ) j 

8 >S'curr BestMove (CandidateList) ; 

9 TabuList <— Scurr feature', 

10 if Cost (Scurr) ^ Cost (/Sbest) then 

11 I •S'test Scufr ] 

12 end 

13 end 

14 return Sbest] 



Algorithm 2.11.2: Pseudocode for the MemoryBasedReaction func- 
tion. 

Input: Increase, Decrease, ProblemSize 

Output: 

1 if HaveVisitedSolutionBef ore (^citrrj VisitedSolutions) then 

2 
3 
4 
5 
6 



7 

8 
9 



Scurrt RetrieveLastTimeVisited(VisitedSolutions, Scurr^'-, 
Repetition Interval ■<— Iterationi — Scurrt; 
Scurrt ^ Iterationi] 

if Repetition Interval < 2 x ProblemSize then 

Repetitionlntervalavg 0.1 X Repetition Interval + 0.9 x 

Repetitionlntervalavg ', 

ProhibitionPeriod ProhibitionPeriod X Increase; 
ProhibitionPeriodt ^ Iterationi] 
end 



10 else 
11 

12 



VisitedSolutions ^ Scurr] 



Scurrt ^ Iterationi] 

13 end 

14 if IteratioUi — ProhibitionPeriodt > Repetitionlntervalavg then 



15 
16 

17 end 



ProhibitionPeriod ^ Max(i, ProhibitionPeriod x Decrease); 
ProhibitionPeriodt ^ Iterationi] 
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Algorithm 2.11.3: Pseudocode for the BestMove function. 
Input: ProblemSize 

Output: Scurr 

1 CandidateList admissible ^ GetAdmissibleMoves(CandidateList) ; 

2 C andidateListfabu ^ CandidateList — CandidateList admissible'i 

3 if Size (CandidateListadmissible^ < ^ then 

4 Prohibition Period ■<— ProblemSize — 2; 

5 ProhihitionPeriodt ^ Iterationi] 

6 end 

7 Scurr GetBest (CandidateListadmissible)] 

8 Sbesttabu ^ Get?>est{CandidateListtahu)\ 

9 if Cost(56esita6u) < Cost(S'best) A Cost (.Sfoestfofcu) < Cost (iSct 
then 

10 I Scurr ^ Sbesttabu] 

11 end 

12 return Scurr', 



Algorithm 2.11.4: Pseudocode for the IsTabu function. 
Input: 

Output: Tabu 

1 Tabu FALSE; 

2 Scurr^^^^^^^^ -4— RetrieveTimeFeatureLastUsed(/Scwrr y'eature); 

3 if Scurr^^^^^^^^ > IteratioUcurr — Prohibition Period then 

4 I Tabu ^ TRUE; 

5 end 

6 return Tabu; 



• Reactive Tabu Search was designed for discrete domains such as 
combinatorial optimization, although has been applied to continuous 
function optimization. 

• Reactive Tabu Search was proposed to use efficient memory data 
structures such as hash tables. 

• Reactive Tabu Search was proposed to use an long-term memory to 
diversify the search after a threshold of cycle repetitions has been 
reached. 

• The increase parameter should be greater than one (such as 1.1 or 
1.3) and the decrease parameter should be less than one (such as 0.9 
or 0.8). 
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2.11.5 Code Listing 

Listing 2.10 provides an example of the Reactive Tabu Search algorithm 
implemented in the Ruby Programming Language. The algorithm is applied 
to the Berlin52 instance of the Traveling Salesman Problem (TSP), taken 
from the TSPLIB. The problem seeks a permutation of the order to visit 
cities (called a tour) that minimizes the total distance traveled. The optimal 
tour distance for Berlin52 instance is 7542 units. 

The procedure is based on the code listing described by Battiti and 
Tecchiolli in [9] with supplements like the IsTabu function from [7]. The 
implementation does not use efficient memory data structures such as hash 
tables. The algorithm is initialized with a stochastic 2-opt local search, 
and the neighborhood is generated as a fixed candidate list of stochastic 
2-opt moves. The edges selected for changing in the 2-opt move are stored 
as features in the tabu list. The example does not implement the escape 
procedure for search diversification. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (perm, cities) 
distance = 0 

perm. each_with_index do |cl, i| 

c2 = (i==perm. size-l) ? perm[0] : perm[i+l] 

distance += euc_2d(cities [cl] , cities [c2]) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array .new(cities. size)-[ | i | i} 
perm. each_index do |i| 

r = rand (perm . size-i) + i 
perm[r] , perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def stochastic_two_opt (parent) 
perm = Array . new(parent) 

cl, c2 = raiid(perm. size) , rand (perm . size) 
exclude = [cl] 

exclude << ((cl==0) ? perm. size-l : cl-l) 

exclude << ( (cl==perm. size-l) ? 0 : cl+1) 

c2 = rand(perm. size) while exclude . include? (c2) 

cl, c2 = c2, cl if c2 < cl 

perm[cl . . . c2] = perm[cl ... c2] . reverse 

return perm, [ [parent [cl-l] , parent [cl]], [parent [c2-l] , parent [c2]]] 
end 

def is_tabu? (edge , tabu_list, iter, prohib_period) 
tabu_list . each do | entry | 
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if entry [: edge] == edge 

return true if entry [: iter] >= iter-prohib_period 

return false 
end 
end 

return false 
end 

def make_tabu(tabu_list , edge, iter) 
tabu_list . each do | entry | 
if entry [: edge] == edge 
entry [: iter] = iter 
return entry 
end 
end 

entry = ■[ : edge=>edge, : iter=>iter} 
tabu_list . push(entry) 
return entry 
end 

def to_edge_list (perm) 
list = □ 

perm. each_with_ index do |cl, i| 

c2 = (i==perm. size-1) ? perm[0] : perm[i+l] 
cl, c2 = c2, cl if cl > c2 
list « [cl, c2] 
end 

return list 
end 

def equivalent? (ell, el2) 

ell. each -[|e| return false if !el2. include? (e) } 

return true 
end 

def generate_candidate(best, cities) 
candidate = {} 

candidate [: vector] , edges = stochastic_two_opt (best [: vector] ) 
candidate [: cost] = cost (candidate [: vector] , cities) 
return candidate, edges 
end 

def get_candidate_entry(visited_list, permutation) 
edgeList = to_edge_list (permutation) 
visited_list.each do I entry I 

return entry if equivalent? (edgeList, entry [:edgelist] ) 
end 

return nil 
end 

def store_permutation(visited_list, permutation, iteration) 

entry = {)• 

entry C:edgelist] = to_edge_list (permutation) 

entry [titer] = iteration 
entry [: visits] = 1 
visited_list . push (entry) 



84 



Chapter 2. Stochastic Algorithms 



return entry 
end 

def sort_neighborhood(candidates, tabu_list, prohib_period, iteration) 
tabu, admissable = [] , [] 
candidates . each do |a| 

if is_tabu? (a[l] [0] , tabu_list, iteration, prohib_period) or 
is_tabu?(a[l] [1] , tabu_list, iteration, prohib_period) 
tabu « a 
else 

admissable << a 
end 
end 

return [tabu, admissable] 
end 

def search(cities , max_cand, max_iter, increase, decrease) 
current = {: vector=>random_permutation(cities)]- 
current [: cost] = cost (current [: vector] , cities) 
best = current 

tabu_list, prohib_period = [] , 1 

visited_list , avg_size, last_change = [] , 1, 0 

max_ iter. times do I iter I 

candidate_entry = get_candidate_entry (visited_list , current [: vector] ) 
if !candidate_entry.nil? 

repetition_interval = iter - candidate_entry [ : iter] 

candidate_entry [: iter] = iter 

candidate_entry [; visits] += 1 

if repetition_interval < 2* (cities . size-1) 

avg_size = 0. 1* (iter-candidate_entry [ : iter] ) + 0.9*avg_size 
prohib_period = (prohib_period.to_f * increase) 
last_change = iter 
end 
else 

store_permutation(visited_list , current [: vector] , iter) 
end 

if iter-last_change > avg_size 

prohib_period = [prohib_period*decrease, 1] .max 

last_change = iter 
end 

candidates = Array. new (max_cand) do |i| 

generate.candidate (current , cities) 
end 

candidates . sort ! {Ixjyl x. first [: cost] <=> y . first [: cost]} 

tabu,admis = sort .neighborhood (candidates, tabu_list,prohib_period, iter) 

if admis.size < 2 

prohib_period = cities . size-2 
last_change = iter 
end 

current ,best_move_edges = (admis. empty?) ? tabu. first : admis. first 

if ! tabu. empty? 
tf = tabu.f irst [0] 

if tf [: cost] <best [: cost] and tf [: cost] <current [: cost] 

current, best_move_edges = tabu. first 
end 
end 
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best_move_edges . each { | edge | make_tabu(tabu_list , edge, iter)} 
best = candidates . first [0] if candidates . f irst [0] [: cost] < best[:cost] 
puts " > it=#-[iter}- , tenure=#-Cprohib_period . round}- , best=#-[best [ : cost] }-' 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 , 575] , [25 , 185] , [345 ,750] , [945, 685] , [845 , 655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iter = 100 
max_candidates = 50 
increase = 1.3 
decrease =0.9 

# execute the algorithm 

best = search(berlin52, max_candidates , max_iter, increase, decrease) 
puts "Done. Best Solution: c=#-[best [ : cost] }■ , v=#-[best [: vector] . inspect}-" 
end 

Listing 2.10: Reactive Tabu Search in Ruby 



2.11.6 References 

Primary Sources 

Reactive Tabu Search was proposed by Battiti and Tecchiolh as an extension 
to Tabu Search that included an adaptive tabu hst size in addition to a 
diversification mechanism [7]. The technique also used efficient memory 
structures that were based on an earlier work by Battiti and Tecchiolli that 
considered a parallel tabu search [6]. Some early application papers by 
Battiti and Tecchiolli include a comparison to Simulated Annealing applied 
to the Quadratic Assignment Problem [8], benchmarked on instances of 
the knapsack problem and N-K models and compared with Repeated Local 
Minima Search, Simulated Annealing, and Genetic Algorithms [9], and 
training neural networks on an array of problem instances [10]. 

Learn More 

Reactive Tabu Search was abstracted to a form called Reactive Local 
Search that considers adaptive methods that learn suitable parameters for 
heuristics that manage an embedded local search technique [4, 5]. Under 
this abstraction, the Reactive Tabu Search algorithm is a single example 
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of the Reactive Local Search principle applied to the Tabu Search. This 
framework was further extended to the use of any adaptive machine learning 
techniques to adapt the parameters of an algorithm by reacting to algorithm 
outcomes online while solving a problem, called Reactive Search [1]. The 
best reference for this general framework is the book on Reactive Search 
Optimization by Battiti, Brunato, and Mascia [3]. Additionally, the review 
chapter by Battiti and Brunato provides a contemporary description [2]. 
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3.1 Overview 

This chapter describes Evolutionary Algorithms. 

3.1.1 Evolution 

Evolutionary Algorithms belong to the Evolutionary Computation field of 
study concerned with computational methods inspired by the process and 
mechanisms of biological evolution. The process of evolution by means 
of natural selection (descent with modification) was proposed by Darwin 
to account for the variety of life and its suitability (adaptive fit) for its 
environment. The mechanisms of evolution describe how evolution actually 
takes place through the modification and propagation of genetic material 
(proteins). Evolutionary Algorithms are concerned with investigating com- 
putational systems that resemble simplified versions of the processes and 
mechanisms of evolution toward achieving the effects of these processes 
and mechanisms, namely the development of adaptive systems. Additional 
subject areas that fall within the realm of Evolutionary Computation are 
algorithms that seek to exploit the properties from the related fields of 
Population Genetics, Population Ecology, Coevolutionary Biology, and 
Developmental Biology. 

3.1.2 References 

Evolutionary Algorithms share properties of adaptation through an iterative 
process that accumulates and amplifies beneficial variation through trial 
and error. Candidate solutions represent members of a virtual population 
striving to survive in an environment defined by a problem specific objective 
function. In each case, the evolutionary process refines the adaptive fit of 
the population of candidate solutions in the environment, typically using 
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surrogates for the mechanisms of evolution such as genetic recombination 
and mutation. 

There are many excellent texts on the theory of evolution, although 
Darwin's original source can be an interesting and surprisingly enjoyable 
read [5]. Huxley's book defined the modern synthesis in evolutionary biology 
that combined Darwin's natural selection with Mendel's genetic mechanisms 
[25], although any good textbook on evolution will suffice (such as Futuyma's 
''''Evolution''' [13]). Popular science books on evolution are an easy place to 
start, such as Dawkins' "T/ie Selfish Gene''' that presents a gene-centric 
perspective on evolution [6], and Dennett's ''''Darwin's Dangerous Idea^^ that 
considers the algorithmic properties of the process [8]. 

Goldberg's classic text is still a valuable resource for the Genetic Algo- 
rithm [14], and Holland's text is interesting for those looking to learn about 
the research into adaptive systems that became the Genetic Algorithm 
[23]. Additionally, the seminal work by Koza should be considered for 
those interested in Genetic Programming [30], and Schwefel's seminal work 
should be considered for those with an interest in Evolution Strategies [34]. 
For an in-depth review of the history of research into the use of simulated 
evolutionary processed for problem solving, see Fogel [12] For a rounded and 
modern review of the field of Evolutionary Computation, Back, Fogel, and 
Michalewicz's two volumes of ^^Evolutionary Computation^^ are an excellent 
resource covering the major techniques, theory, and application specific 
concerns [2, 3]. For some additional modern books on the unified field of 
Evolutionary Computation and Evolutionary Algorithms, see De Jong [26], 
a recent edition of Fogel [11], and Eiben and Smith [9]. 

3.1.3 Extensions 

There are many other algorithms and classes of algorithm that were not 
described from the field of Evolutionary Computation, not limited to: 

• Distributed Evolutionary Computation: that are designed to 
partition a population across computer networks or computational 
units such as the Distributed or 'Island Population' Genetic Algorithm 
[4, 35] and Diffusion Genetic Algorithms (also known as Cellular 
Genetic Algorithms) [1]. 

• Niching Genetic Algorithms: that form groups or sub-populations 
automatically within a population such as the Deterministic Crowding 
Genetic Algorithm [31, 32], Restricted Tournament Selection [20, 21], 
and Fitness Sharing Genetic Algorithm [7, 19]. 

• Evolutionary Multiple Objective Optimization Algorithms: 

such as Vector-Evaluated Genetic Algorithm (VEGA) [33], Pareto 
Archived Evolution Strategy (PAES) [28, 29], and the Niched Pareto 
Genetic Algorithm (NPGA) [24]. 
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• Classical Techniques: such as GENITOR [36], and the CHC Ge- 
netic Algorithm [10]. 

• Competent Genetic Algorithms: (so-called [15]) such as the 
Messy Genetic Algorithm [17, 18], Fast Messy Genetic Algorithm 
[16], Gene Expression Messy Genetic Algorithm [27], and the Linkage- 
Learning Genetic Algorithm [22]. 
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3.2 Genetic Algorithm 

Genetic Algorithm, GA, Simple Genetic Algorithm, SGA, Canonical Genetic 
Algorithm, CGA. 

3.2.1 Taxonomy 

The Genetic Algorithm is an Adaptive Strategy and a Global Optimization 
technique. It is an Evolutionary Algorithm and belongs to the broader 
study of Evolutionary Computation. The Genetic Algorithm is a sibling of 
other Evolutionary Algorithms such as Genetic Programming (Section 3.3), 
Evolution Strategies (Section 3.4), Evolutionary Programming (Section 3.6), 
and Learning Classifier Systems (Section 3.9). The Genetic Algorithm is a 
parent of a large number of variant techniques and sub-fields too numerous 
to list. 

3.2.2 Inspiration 

The Genetic Algorithm is inspired by population genetics (including heredity 
and gene frequencies), and evolution at the population level, as well as the 
Mendelian understanding of the structure (such as chromosomes, genes, 
alleles) and mechanisms (such as recombination and mutation). This is the 
so-called new or modern synthesis of evolutionary biology. 

3.2.3 Metaphor 

Individuals of a population contribute their genetic material (called the 
genotype) proportional to their suitability of their expressed genome (called 
their phenotype) to their environment, in the form of offspring. The next 
generation is created through a process of mating that involves recombination 
of two individuals genomes in the population with the introduction of random 
copying errors (called mutation). This iterative process may result in an 
improved adaptive-fit between the phenotypes of individuals in a population 
and the environment. 

3.2.4 Strategy 

The objective of the Genetic Algorithm is to maximize the payoff of candidate 
solutions in the population against a cost function from the problem domain. 
The strategy for the Genetic Algorithm is to repeatedly employ surrogates 
for the recombination and mutation genetic mechanisms on the population 
of candidate solutions, where the cost function (also known as objective or 
fitness function) applied to a decoded representation of a candidate governs 
the probabilistic contributions a given candidate solution can make to the 
subsequent generation of candidate solutions. 
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3.2.5 Procedure 

Algorithm 3.2.1 provides a pseudocode listing of the Genetic Algorithm for 
minimizing a cost function. 



Algorithm 3.2.1: Pseudocode for the Genetic Algorithm. 

Input. Popul(ltiongi^g.j PvohleiTlgi-^Q.j Pcrossoveri Pmutation 
Output: Shest 

1 Population <(— InitializePopulationCPopw/ationg^^e? 
Problem size)-, 

2 EvaluatePopulation(Population) ; 

3 SifQgt GetBestSolution(Population) ; 

4 while -iStopConditionO do 

5 Parents <r- SelectParents (Population, Population size)] 

6 Children ^ 0; 

7 foreach Parenti, Parent2 G Parents do 

8 Childi, Child2 ^ Crossover (Parenti, Parent2, Pcrossover) 

9 Children ^ Kntate (Childi, Pmutation); 

10 Children ^ Mutate (C/ii/<i2; Pmutation)', 

11 end 

12 EvaluatePopulation(Children) ; 

13 Sbest ^ GetBestSolution(Children) ; 

14 Population Replace (Population, Children); 

15 end 

16 return Sbest; 



3.2.6 Heuristics 

• Binary strings (referred to as 'bitstrings') are the classical represen- 
tation as they can be decoded to almost any desired representation. 
Real- valued and integer variables can be decoded using the binary 
coded decimal method, one's or two's complement methods, or the 
gray code method, the latter of which is generally preferred. 

• Problem specific representations and customized genetic operators 
should be adopted, incorporating as much prior information about 
the problem domain as possible. 

• The size of the population must be large enough to provide sufficient 
coverage of the domain and mixing of the useful sub-components of 
the solution [7]. 

• The Genetic Algorithm is classically configured with a high probability 
of recombination (such as 95%-99% of the selected population) and 
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a low probability of mutation (such as where L is the number of 
components in a solution) [1, 18]. 

• The fitness-proportionate selection of candidate solutions to contribute 
to the next generation should be neither too greedy (to avoid the 
takeover of fitter candidate solutions) nor too random. 

3.2.7 Code Listing 

Listing 3.1 provides an example of the Genetic Algorithm implemented in the 
Ruby Programming Language. The demonstration problem is a maximizing 
binary optimization problem called OneMax that seeks a binary string of 
unity (all '1' bits). The objective function provides only an indication of 
the number of correct bits in a candidate string, not the positions of the 
correct bits. 

The Genetic Algorithm is implemented with a conservative configuration 
including binary tournament selection for the selection operator, one-point 
crossover for the recombination operator, and point mutations for the 
mutation operator. 

def onemax(bitstring) 
sum = 0 

bitstring . size . t imes -[ I i I sum+=l if bitstring [i] . chr== ' 1 ' }■ 
return sum 
end 

def random_bitstring(num_bits) 

return (0 . . . num_bits) . inject ( " ") { I s , i I s« ( (rand<0 . 5) ? "1" : "0")} 
end 

def binary_tournament (pop) 

i, j = randCpop. size) , rand(pop . size) 
j = rand (pop . size) while j==i 

return (pop [i] [: fitness] > pop [j ][: fitness] ) ? pop[i] : pop[j] 
end 

def point_mutation(bitstring, rate=l . 0/bitstring. size) 
child = "" 

bitstring. size . times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<rate) ? ((bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def crossover (parent 1, parent2, rate) 
return ""+parentl if rand()>=rate 
point = 1 + rand(parentl . size-2) 

return parentl [0 . . . point] +parent2 [point . . . (parent 1 . size) ] 
end 

def reproduce (selected, pop_size, p_cross, p_mutation) 
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children = [] 

selected. each_with_index do |pl, i| 

p2 = (i. modulo (2) ==0) ? selected[i+l] : selected[i-l] 
p2 = selected [0] if i == selected. size-1 
child = {} 

child [: bitstring] = crossover (pi [: bitstring] , p2 [ : bitstring] , p_cross) 
child [:bitstring] = point_mutation(child[ : bitstring] , p_mutation) 
children « child 

break if children. size >= pop_size 
end 

return children 
end 

def search (max_gens , num_bits, pop_size, p_crossover, p_mutation) 
population = Array .new (pop_size) do |i| 

•i : bitstring=>random_bitstring(num_bits) } 
end 

population. each{ I c I c[:fitness] = onemax(c[: bitstring])} 

best = population. sort{ I x,y I y[:fitness] <=> x[:fitness]}-. first 

max_gens . times do I gen I 

selected = Array . new(pop_size) { | i | binary_tournament (population) }- 
children = reproduce (selected, pop_size, p_crossover, p_mutation) 
children. each{ I c I c[: fitness] = onemax(c [: bitstring] )} 
children, sort K I x,y I y[:fitness] <=> x[:f itness]}- 

best = children. first if children. first[: fitness] >= best [: fitness] 
population = children 

puts " > gen #"[gen]-, best: #{best [:f itness]}, #{best [: bitstring]}" 
break if best [: fitness] == num.bits 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

num_bits = 64 

# algorithm configuration 
max_gens = 100 
pop_size = 100 
p_crossover = 0.98 
p_mutation = 1.0/num_bits 

# execute the algorithm 

best = search (max_gens, num_bits, pop_size, p_crossover, p.mutation) 
puts "done! Solution: f=#{best [: fitness]}, s=#{best [: bitstring] }" 
end 

Listing 3.1: Genetic Algorithm in Ruby 



3.2.8 References 

Primary Sources 

Holland is the grandfather of the field that became Genetic Algorithms. 
Holland investigated adaptive systems in the late 1960s proposing an adap- 
tive system formalism and adaptive strategies referred to as 'adaptive plans' 
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[8-10]. Holland's theoretical framework was investigated and elaborated 
by his Ph.D. students at the University of Michigan. Rosenberg investi- 
gated a chemical and molecular model of a biological inspired adaptive plan 
[19]. Bagley investigated meta-environments and a genetic adaptive plan 
referred to as a genetic algorithm applied to a simple game called hexapawn 
[2]. Cavicchio further elaborated the genetic adaptive plan by proposing 
numerous variations, referring to some as 'reproductive plans' [15]. 

Other important contributions were made by Frantz who investigated 
what were referred to as genetic algorithms for search [3], and Hollstien who 
investigated genetic plans for adaptive control and function optimization [12]. 
De Jong performed a seminal investigation of the genetic adaptive model 
(genetic plans) applied to continuous function optimization and his suite of 
test problems adopted are still commonly used [13]. Holland wrote the the 
seminal book on his research focusing on the proposed adaptive systems 
formalism, the reproductive and genetic adaptive plans, and provided a 
theoretical framework for the mechanisms used and explanation for the 
capabilities of what would become genetic algorithms [11]. 

Learn More 

The field of genetic algorithms is very large, resulting in large numbers of 
variations on the canonical technique. Goldberg provides a classical overview 
of the field in a review article [5], as does Mitchell [16]. Whitley describes 
a classical tutorial for the Genetic Algorithm covering both practical and 
theoretical concerns [20]. 

The algorithm is highly-modular and a sub-field exists to study each sub- 
process, specifically: selection, recombination, mutation, and representation. 
The Genetic Algorithm is most commonly used as an optimization technique, 
although it should also be considered a general adaptive strategy [14]. The 
schema theorem is a classical explanation for the power of the Genetic 
Algorithm proposed by Holland [11], and investigated by Goldberg under 
the name of the building block hypothesis [4]. 

The classical book on genetic algorithms as an optimization and machine 
learning technique was written by Goldberg and provides an in-depth review 
and practical study of the approach [4]. Mitchell provides a contemporary 
reference text introducing the technique and the field [17]. Finally, Goldberg 
provides a modern study of the field, the lessons learned, and reviews the 
broader toolset of optimization algorithms that the field has produced [6]. 
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3.3 Genetic Programming 

Genetic Programming, GP. 

3.3.1 Taxonomy 

The Genetic Programming algorithm is an example of an Evolutionary 
Algorithm and belongs to the field of Evolutionary Computation and more 
broadly Computational Intelligence and Biologically hispired Computation. 
The Genetic Programming algorithm is a sibling to other Evolutionary 
Algorithms such as the Genetic Algorithm (Section 3.2), Evolution Strate- 
gies (Section 3.4), Evolutionary Programming (Section 3.6), and Learning 
Classifier Systems (Section 3.9). Technically, the Genetic Programming 
algorithm is an extension of the Genetic Algorithm. The Genetic Algorithm 
is a parent to a host of variations and extensions. 

3.3.2 Inspiration 

The Genetic Programming algorithm is inspired by population genetics 
(including heredity and gene frequencies), and evolution at the population 
level, as well as the Mendelian understanding of the structure (such as 
chromosomes, genes, alleles) and mechanisms (such as recombination and 
mutation). This is the so-called new or modern synthesis of evolutionary 
biology. 

3.3.3 Metaphor 

Individuals of a population contribute their genetic material (called the 
genotype) proportional to their suitability of their expressed genome (called 
their phenotype) to their environment. The next generation is created 
through a process of mating that involves genetic operators such as recom- 
bination of two individuals genomes in the population and the introduction 
of random copying errors (called mutation). This iterative process may 
result in an improved adaptive-fit between the phenotypes of individuals in 
a population and the environment. 

Programs may be evolved and used in a secondary adaptive process, 
where an assessment of candidates at the end of that secondary adaptive 
process is used for differential reproductive success in the first evolution- 
ary process. This system may be understood as the inter-dependencies 
experienced in evolutionary development where evolution operates upon 
an embryo that in turn develops into an individual in an environment that 
eventually may reproduce. 
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3.3.4 Strategy 

The objective of the Genetic Programming algorithm is to use induction to 
devise a computer program. This is achieved by using evolutionary operators 
on candidate programs with a tree structure to improve the adaptive fit 
between the population of candidate programs and an objective function. 
An assessment of a candidate solution involves its execution. 

3.3.5 Procedure 

Algorithm 3.3.1 provides a pseudocode listing of the Genetic Programming 
algorithm for minimizing a cost function, based on Koza and Poll's tutorial 

[9]. 

The Genetic Program uses LISP-like symbolic expressions called S- 
expressions that represent the graph of a program with function nodes and 
terminal nodes. While the algorithm is running, the programs are treated 
like data, and when they are evaluated they are executed. The traversal of 
a program graph is always depth first, and functions must always return a 
value. 

3.3.6 Heuristics 

• The Genetic Programming algorithm was designed for inductive auto- 
matic programming and is well suited to symbolic regression, controller 
design, and machine learning tasks under the broader name of function 
approximation. 

• Traditionally Lisp symbolic expressions are evolved and evaluated 
in a virtual machine, although the approach has been applied with 
compiled programming languages. 

• The evaluation (fitness assignment) of a candidate solution typically 
takes the structure of the program into account, rewarding parsimony. 

• The selection process should be balanced between random selection and 
greedy selection to bias the search towards fitter candidate solutions 
(exploitation), whilst promoting useful diversity into the population 
(exploration) . 

• A program may respond to zero or more input values and may produce 
one or more outputs. 

• All functions used in the function node set must return a usable result. 
For example, the division function must return a sensible value (such 
as zero or one) when a division by zero occurs. 

• All genetic operations ensure (or should ensure) that syntactically valid 
and executable programs are produced as a result of their application. 
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Algorithm 3.3.1: Pseudocode for Genetic Programming. 



crossover 



mutation j 



Input: Population sizes nodes f una nodestermi Pc 

^reproductions ^alteration 
Output: Sbest 

1 Population ^ InitializePopulation(Popw/ationjsi^e> nodesf^no 
nodesterm ) ; 

2 EvaluatePopulation(Population) ; 

3 SijQst ^ GetBestSolutioii(Population) ; 

4 while -iStopConditionO do 
Children ^ 0; 

while Size (Children) < Population size do 

Operator ^ SelectGeneticOperat or (Pcr-ossover^ Pmutation, 

Preproduction j Palter ation ) 5 

if Operator = CrossoverOperator then 

Parentis Parent2 <— SelectParents (Population, 

Popul ation si zey, 

Childi, Child2 ^ Crossover CP arenti, Parent2)] 
Children ^ Childi; 
Children ^ Child^] 
else if Operator = MutationOperator then 

Parenti SelectParents (Population, Population size)'-, 
Childi ^ Mutate (Paren^i ) ; 
Children <— Childi] 
else if Operator = ReproductionOperator then 

Parenti ^ SelectPeurents (Population, Population size)] 
Childi ^ Reproduce (Parenti); 
Children Childi] 
else if Operator = AlterationOperator then 

Parenti ^ SelectPsirents (Population, Population size)] 
Childi ^ AlterArchitecture (Parenti ) ; 
Children i— Childi] 
end 
end 

EvaluatePopulation(Children) ; 
Sbest ^ GetBestSolution(Children, Sbest)] 
Population Children; 

30 end 

31 return Sbest] 



Copyrighted material 



102 



Chapter 3. Evolutionary Algorithms 



• The Genetic Programming algorithm is commonly configured with a 
high-probability of crossover (> 90%) and a low-probability of muta- 
tion (< 1%). Other operators such as reproduction and architecture 
alterations are used with moderate-level probabilities and fill in the 
probabilistic gap. 

• Architecture altering operations are not limited to the duplication 
and deletion of sub-structures of a given program. 

• The crossover genetic operator in the algorithm is commonly configured 
to select a function as a the cross-point with a high-probability (> 90%) 
and low-probability of selecting a terminal as a cross-point (< 10%). 

• The function set may also include control structures such as conditional 
statements and loop constructs. 

• The Genetic Programing algorithm can be realized as a stack-based 
virtual machine as opposed to a call graph [11]. 

• The Genetic Programming algorithm can make use of Automatically 
Defined Functions (ADFs) that are sub-graphs and are promoted to 
the status of functions for reuse and are co-evolved with the programs. 

• The genetic operators employed during reproduction in the algorithm 
may be considered transformation programs for candidate solutions 
and may themselves be co-evolved in the algorithm [1]. 

3.3.7 Code Listing 

Listing 3.2 provides an example of the Genetic Programming algorithm 
implemented in the Ruby Programming Language based on Koza and Poll's 
tutorial [9]. 

The demonstration problem is an instance of a symbolic regression, where 
a function must be devised to match a set of observations. In this case the 
target function is a quadratic polynomial -\- x + 1 where x G [—1,1]. The 
observations are generated directly from the target function without noise 
for the purposes of this example. In practical problems, if one knew and 
had access to the target function then the genetic program would not be 
required. 

The algorithm is configured to search for a program with the function 
set {+, — , X, -=-} and the terminal set {X, R}, where X is the input value, 
and J? is a static random variable generated for a program X G [—5, 5]. A 
division by zero returns a value of one. The fitness of a candidate solution is 
calculated by evaluating the program on range of random input values and 
calculating the Root Mean Squared Error (RMSE). The algorithm is config- 
ured with a 90% probability of crossover, 8% probability of reproduction 
(copying), and a 2% probability of mutation. For brevity, the algorithm 



CiJ!. 



1 

2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 



3.3. Genetic Programining 



103 



does not implement the architecture altering genetic operation and does not 
bias crossover points towards functions over terminals. 

def rand_in_bounds (min, max) 

return min + (max-min) *rand() 
end 

def print_program(node) 

return node if !node.kind_of? (Array) 

return "(#{node[0]} #{print_program(node [1] )} #{print_program(node [2] )}) " 
end 

def eval_prograin(node, map) 
if !node.kind_of? (Array) 
return map [node] . to_f if !map [node] .nil? 
return node.to_f 
end 

argl, arg2 = eval_program (node [1] , map), eval_program(node [2] , map) 
return 0 if node[0] === :/ and arg2 ==0.0 

return argl. send (node [0] , arg2) 

end 

def generate_random_program(max, funcs, terms, depth=0) 
if depth==max-l or (depth>l and rand()<0.1) 
t = terms [rand (terms . size)] 

return ((t=='R') ? rand_in_bounds(-5. 0, +5.0) : t) 
end 

depth += 1 

argl = generate_random_progrEim(max, funcs, terms, depth) 
arg2 = generate_random_program(max , funcs, terms, depth) 
return [funcs [rand (funcs . size) ] , argl, arg2] 
end 

de f c cunt _no de s ( node ) 

return 1 if ! node. kind_of? (Array) 
al = count_nodes(node [1] ) 

a2 = count_nodes (node [2] ) 
return al+a2+l 
end 

def target_f unction(input) 

return input**2 + input + 1 
end 

def fitness (program, num_trials=20) 
sum_error =0.0 
num_trials. times do |i| 

input = rand_in_bounds (-1 . 0 , 1.0) 

error = eval_program(program, {' X '=> input}) - target_f unction (input) 
sum_error += error. abs 
end 

return sum_error / num_trials . to_f 
end 

def tournament _selection (pop, bouts) 

selected = Array, new (bouts) -[pop [rand (pop. size)]} 
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selected . sort ! { I x , y | x [ : f itness] <=>y [ : fitness] } 
return selected. first 
end 

def replace_node(node, replacement, node.num, cur_node=0) 
return [replacement, (cur_node+l)] if cur_node == node_num 

cur_node += 1 

return [node, cur _node] if ! node. kind_of? (Array) 

al, cur_node = replace_node (node [1] , replacement, node.num, cur_node) 
a2, cur_node = replace_node (node [2] , replacement, node.num, cur_node) 
return [[node[0], al, a2] , cur_node] 
end 

def copy .program (node) 
return node if !node.kind_of?( Array) 

return [node[0], copy_program(node [1] ) , copy_program(node [2] )] 
end 

def get _node (node, node_num, current _node=0) 
return node, (current _node+l) if current _node == node.num 

current _node += 1 

return nil,current_node if ! node . kind_of? (Array) 

al, current _node = get_node(node [1] , node_num, current _node) 

return al , current_node if lal.nil? 

a2, current_node = get_node (node [2] , node_num, current_node) 
return a2, current _node if !a2.nil? 
return nil,current_node 
end 

def prune (node, inax_depth, terms, depth=0) 
if depth == max_depth-l 

t = terms [rand (terms . size) ] 

return ((t=='R') ? rand_in_bounds(-5. 0, +5.0) : t) 
end 

depth += 1 

return node if ! node . kind_of? (Array) 
al = prune (node [1] , max_depth, terms, depth) 
a2 = prune (node [2] , maix.depth, terms, depth) 
return [node [0] , al , a2] 
end 

def crossover (parent 1, parent2, max.depth, terms) 

ptl, pt2 = rand(count_nodes(parentl)-2)+l, rand(count_nodes(parent2)-2)+l 
treel, cl = get_node (parent 1, ptl) 
tree2, c2 = get_node(parent2, pt2) 

childl, cl = replace_node (parent 1 , copy_program(tree2) , ptl) 
childl = prune (childl , max_depth, terms) 

child2, c2 = replace_node(parent2, copy_program(treel) , pt2) 
child2 = prune (child2, max.depth, terms) 
return [childl , child2] 
end 

def mutation (parent, majc_depth, functs, terms) 

random_tree = generate_random_program(max_depth/2, functs, terms) 
point = rand(count_nodes (parent) ) 

child, count = replace_node (parent , random_tree, point) 
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child = prune (child, max_depth, terms) 
return child 
end 

def seELrch(mELX_gens, pop_size, meix.depth, bouts, p_repro, p.cross, p_mut, 

functs, terms) 
population = Array .new(pop_size) do |i| 

{:prog=>generate_random_prograin(max_depth, functs, terms)} 
end 

population. each{ I c I c[: fitness] = f itness (c [ : prog] ) ]- 
best = population. sort{ I x,y I x[:fitness] <=> y[:f itness]}. first 
max_gens . times do I gen I 
children = [] 

while children. size < pop_size 
operation = randO 

pi = tournament_selection (population, bouts) 

cl = O 

if operation < p_repro 

cl[:prog] = copy_progrELiii(pl [ :prog] ) 
elsif operation < p_repro+p_cross 

p2 = tournament_selection(population, bouts) 

c2 = {} 

cl[:prog] ,c2[:prog] = crossover (pi [: prog] , p2[:prog], majc_depth, 

terms) 

children << c2 
elsif operation < p_repro+p_cross+p_mut 

cl[:prog] = mutation (pi [:prog] , max_depth, functs, terms) 
end 

children << cl if children. size < pop_size 
end 

children. each{ I c I c[:fitness] = f itness (c [: prog] )} 

population = children 

population. sort K I x,y I x[:fitness] <=> y[:fitness]} 

best = population. first if population. first [: fitness] <= best [: fitness] 

puts " > gen #{gen}, f itness=#{best [: fitness] }" 
break if best [: fitness] == 0 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
terms = ['X' , 'R'] 
functs = [:+, :-, :/] 

# algorithm configuration 
max_gens = 100 
max_depth = 7 

pop_size = 100 
bouts = 5 
p_repro = 0.08 
p_cross = 0.90 
p_mut =0.02 

# execute the algorithm 

best = search (max_gens, pop_size, max_depth, bouts, p_repro, p_cross, 

p_mut, functs, terms) 
puts "done! Solution: f=#{best [: fitness] } , #{print .program (best [: prog] )}" 
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163 end 



Listing 3.2: Genetic Programming in Ruby 



3.3.8 References 
Primary Sources 

An early work by Cramer involved the study of a Genetic Algorithm using an 
expression tree structure for representing computer programs for primitive 
mathematical operations [3]. Koza is credited with the development of 
the field of Genetic Programming. An early paper by Koza referred to 
his hierarchical genetic algorithms as an extension to the simple genetic 
algorithm that use symbolic expressions (S-expressions) as a representation 
and were applied to a range of induction-style problems [4] . The seminal 
reference for the field is Koza's 1992 book on Genetic Programming [5]. 



Learn More 

The field of Genetic Programming is vast, including many books, dedicated 
conferences and thousands of publications. Koza is generally credited with 
the development and popularizing of the field, publishing a large number of 
books and papers himself. Koza provides a practical introduction to the 
field as a tutorial and provides recent overview of the broader field and 
usage of the technique [9]. 

In addition his the seminal 1992 book, Koza has released three more 
volumes in the series including volume II on Automatically Defined Functions 
(ADFs) [6], volume III that considered the Genetic Programming Problem 
Solver (GPPS) for automatically defining the function set and program 
structure for a given problem [7] , and volume IV that focuses on the human 
competitive results the technique is able to achieve in a routine manner 
[8]. All books are rich with targeted and practical demonstration problem 
instances. 

Some additional excellent books include a text by Banzhaf et al. that 
provides an introduction to the field [2], Langdon and Poll's detailed look 
at the technique [10], and Poli, Langdon, and McPhee's contemporary and 
practical field guide to Genetic Programming [12]. 
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3.4 Evolution Strategies 

Evolution Strategies, Evolution Strategy, Evolutionary Strategies, ES. 

3.4.1 Taxonomy 

Evolution Strategies is a global optimization algorithm and is an instance 
of an Evolutionary Algorithm from the field of Evolutionary Computa- 
tion. Evolution Strategies is a sibling technique to other Evolutionary 
Algorithms such as Genetic Algorithms (Section 3.2), Genetic Programming 
(Section 3.3), Learning Classifier Systems (Section 3.9), and Evolutionary 
Programming (Section 3.6). A popular descendant of the Evolution Strate- 
gies algorithm is the Covariance Matrix Adaptation Evolution Strategies 
(CMA-ES). 

3.4.2 Inspiration 

Evolution Strategies is inspired by the theory of evolution by means of 
natural selection. Specifically, the technique is inspired by macro-level 
or the species-level process of evolution (phenotype, hereditary, variation) 
and is not concerned with the genetic mechanisms of evolution (genome, 
chromosomes, genes, alleles). 

3.4.3 Strategy 

The objective of the Evolution Strategies algorithm is to maximize the 
suitability of collection of candidate solutions in the context of an ob- 
jective function from a domain. The objective was classically achieved 
through the adoption of dynamic variation, a surrogate for descent with 
modification, where the amount of variation was adapted dynamically with 
performance-based heuristics. Contemporary approaches co-adapt param- 
eters that control the amount and bias of variation with the candidate 
solutions. 

3.4.4 Procedure 

Listances of Evolution Strategies algorithms may be concisely described with 
a custom terminology in the form (/i, X) — ES, where /i is number of candidate 
solutions in the parent generation, and A is the number of candidate solutions 
generated from the parent generation. In this configuration, the best ^ are 
kept if A > fi, where A must be great or equal to fi. In addition to the 
so-called comma-selection Evolution Strategies algorithm, a plus-selection 
variation may be defined (/i + A) — ES^ where the best members of the union 
of the fj, and A generations compete based on objective fitness for a position 
in the next generation. The simplest configuration is the (1 -|- 1) — ES, 
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which is a type of greedy hill climbing algorithm. Algorithm 3.4.1 provides 
a pseudocode listing of the (/i, A) — ES algorithm for minimizing a cost 
function. The algorithm shows the adaptation of candidate solutions that co- 
adapt their own strategy parameters that influence the amount of mutation 
applied to a candidate solutions descendants. 



Algorithm 3.4.1: Pseudocode for (yU, A) Evolution Strategies. 
Input: fi. A, ProblemSize 

Output: Sbest 

1 Population ^ InitializePopulation(//, ProblemSize); 

2 EvaluatePopulation(Population) ; 

3 S'best <— GetBest (Population, 1); 

4 while -iStopConditionO do 

5 Children 0; 

6 for i = 0 to A do 

7 Parenti ^ GetParent (Population, i); 

8 5i ^ 0; 

9 ^'^problem ^ ^^^^^^^P^problemj strategy^ i 

10 Sistrategy ^ Mut at 6 (Pi^trategy ) i 

11 Children Si; 

12 end 

13 EvaluatePopulation(Children) ; 

14 Sbest ^ GetBest (Children + Sbest, 

15 Population ^ SelectBest (Population, Children, /i); 

16 end 

17 return Sbest', 



3.4.5 Heuristics 

• Evolution Strategies uses problem specific representations, such as 
real values for continuous function optimization. 

• The algorithm is commonly configured such that 1 < /U < A. 

• The ratio of /i to A influences the amount of selection pressure (greed- 
iness) exerted by the algorithm. 

• A contemporary update to the algorithms notation includes a. p as 
(fi/ p, A) — ES that specifies the number of parents that will contribute 
to each new candidate solution using a recombination operator. 

• A classical rule used to govern the amount of mutation (standard 
deviation used in mutation for continuous function optimization) was 
the I -rule, where the ratio of successful mutations should be ^ of all 
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mutations. If it is greater the variance is increased, otherwise if the 
ratio is is less, the variance is decreased. 

• The comma-selection variation of the algorithm can be good for dy- 
namic problem instances given its capability for continued exploration 
of the search space, whereas the plus-selection variation can be good 
for refinement and convergence. 

3.4.6 Code Listing 

Listing 3.3 provides an example of the Evolution Strategies algorithm 
implemented in the Ruby Programming Language. The demonstration 
problem is an instance of a continuous function optimization that seeks 
mmf{x) where / = Yl7=i-^i^ —5.0 < Xi < 5.0 and n = 2. The optimal 
solution for this basin function is (vq, . . . = 0.0. The algorithm is a 

implementation of Evolution Strategies based on simple version described 
by Back and Schwefel [2], which was also used as the basis of a detailed 
empirical study [11]. The algorithm is an (30 + 20) — £'5' that adapts both the 
problem and strategy (standard deviations) variables. More contemporary 
implementations may modify the strategy variables differently, and include 
an additional set of adapted strategy parameters to influence the direction 
of mutation (see [7] for a concise description). 

def objective_function(vector) 

return vector . inject (0 . 0) {Isum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 

return Array . new(minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def random_gaussian (mean=0 . 0 , stdev=1.0) 
ul = u2 = w = 0 
begin 

ul = 2 * randO - 1 

u2 = 2 * randO - 1 

w = ul * ul + u2 * u2 
end while w >= 1 

w = Math. sqrt( (-2.0 * Math. log (w)) / w) 
return mean + (u2 * w) * stdev 
end 

def mutate_problem(vector , stdevs, search_space) 
child = Array (vector . size) 
vector . each_with_index do |v, i| 

child [i] = V + stdevs [i] * random_gaussian() 

child[i] = search_space [i] [0] if child[i] < search_space [i] [0] 
child [i] = search_space [i] [1] if child [i] > search_space [i] [1] 
end 

return child 
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end 

def mutate_strategy (stdevs) 

tau = Math. sqrt (2. 0*stdevs. size. to_f )**-!. 0 

tau_p = Math. sqrt (2. 0*Math. sqrt (stdevs . size. to_f ))**-!. 0 

child = Array . new(stdevs . size) do |i| 

stdevs [i] * Math. exp(tau_p*randoiii_gaussian() + tau*random_gaussian() 
end 

return child 
end 

def mutate (par , minmeLx) 

child = {} 

child[: vector] = mutate_problem (par [: vector] , par [: strategy] , minmax) 
child [: strategy] = mutate_strategy (par [: strategy] ) 
return child 
end 

def init_popul at ion (minmax, pop_size) 

strategy = Array .new(minmax. size) do |i| 

[0, (minmax [i] [1] -minmax [i] [0] ) * 0.05] 
end 

pop = Array. new(pop_size, {}) 
pop. each_ index do |i| 

pop [i] [: vector] = random_vector (minmax) 

pop [i] [: strategy] = random. vector (strategy) 
end 

pop . each-[ I c I c[: fitness] = obj ective_function(c[: vector] )} 
return pop 
end 

def search (max_gens , search_space , pop_size, num_children) 
population = init_population(search_space , pop_size) 
best = population. sort{ I x,y I x[:fitness] <=> yCifitness]}. first 
max_gens . times do |gen| 

children = Array . new (num_childr en) do |i| 

mutate (population [i] , search_space) 
end 

children. each-C I c I c[:fitness] = objective_function(c[:vector])}- 
union = children+population 

union. sort !{ I x,y I x[: fitness] <=> y[: fitness]} 

best = union. first if union.first[: fitness] < best [: fitness] 

population = union. first (pop_size) 
puts " > gen #-[gen)-, f itness=#-[best [: fitness]}" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

problem_size = 2 

search_space = Array. new(problem_size) {|i| [-5, +5]} 

# algorithm configuration 
max_gens = 100 
pop_size = 30 

num_ children = 20 
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# execute the algorithm 

best = search (max_gens , search_space , pop_size, num_children) 
puts "done! Solution: f =#-[best [ : f itness] ]- , s=#-Cbest [: vector] . inspect} " 
end 

Listing 3.3: Evolution Strategies in Ruby 



3.4.7 References 

Primary Sources 

Evolution Strategies was developed by three students (Bienert, Rechenberg, 
Schwefel) at the Technical University in Berlin in 1964 in an effort to 
robotically optimize an aerodynamics design problem. The seminal work 
in Evolution Strategies was Rechenberg 's PhD thesis [5] that was later 
published as a book [6], both in German. Many technical reports and 
papers were published by Schwefel and Rechenberg, although the seminal 
paper published in English was by Klockgether and Schwefel on the two- 
phase nozzle design problem [4]. 

Learn More 

Schwefel published his PhD dissertation [8] not long after Rechenberg, which 
was also published as a book [9], both in German. Schwefel's book was 
later translated into English and represents a classical reference for the 
technique [10]. Back et al. provide a classical introduction to the technique, 
covering the history, development of the algorithm, and the steps that lead 
it to where it was in 1991 [1]. Beyer and Schwefel provide a contemporary 
introduction to the field that includes a detailed history of the approach, 
the developments and improvements since its inception, and an overview of 
the theoretical findings that have been made [3]. 
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3.5 Differential Evolution 

Differential Evolution, DE. 

3.5.1 Taxonomy 

Differential Evolution is a Stochastic Direct Search and Global Optimiza- 
tion algorithm, and is an instance of an Evolutionary Algorithm from the 
field of Evolutionary Computation. It is related to sibling Evolutionary 
Algorithms such as the Genetic Algorithm (Section 3.2), Evolutionary Pro- 
gramming (Section 3.6), and Evolution Strategies (Section 3.4), and has 
some similarities with Particle Swarm Optimization (Section 6.2). 

3.5.2 Strategy 

The Differential Evolution algorithm involves maintaining a population of 
candidate solutions subjected to iterations of recombination, evaluation, 
and selection. The recombination approach involves the creation of new 
candidate solution components based on the weighted difference between 
two randomly selected population members added to a third population 
member. This perturbs population members relative to the spread of the 
broader population. In conjunction with selection, the perturbation effect 
self-organizes the sampling of the problem space, bounding it to known 
areas of interest. 

3.5.3 Procedure 

Differential Evolution has a specialized nomenclature that describes the 
adopted configuration. This takes the form of DF,/x / y / z, where x represents 
the solution to be perturbed (such a random or best). The y signifies the 
number of difference vectors used in the perturbation of a:, where a difference 
vectors is the difference between two randomly selected although distinct 
members of the population. Finally, z signifies the recombination operator 
performed such as bin for binomial and exp for exponential. 

Algorithm 3.5.1 provides a pseudocode listing of the Differential Evo- 
lution algorithm for minimizing a cost function, specifically a DE/rand/- 
1/bin configuration. Algorithm 3.5.2 provides a pseudocode listing of the 
NewSample function from the Differential Evolution algorithm. 

3.5.4 Heuristics 

• Differential evolution was designed for nonlinear, non-differentiable 
continuous function optimization. 

• The weighting factor F G [0,2] controls the amplification of differential 
variation, a value of 0.8 is suggested. 
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Algorithm 3.5.1: Pseudocode for Differential Evolution. 

Input: Population size 1 Problem sizet ^'^Gighting factor ■, 

Crossoverrate 
Output: Skest 

1 Population ^ InitializePopulsitioniPopulationsize, 
Problem size)] 

2 EvaluatePopulation(Population) ; 

3 Shest ^ GetBestSolution(Population) ; 

4 while -1 StopConditionO do 

5 NewPopulation ^ 0; 

6 foreach Pi G Population do 

7 Si -(r- NewSample (Pi, Population, Problemsize) 

W eighting factor, Crossoverrate); 

8 if Cost (Si) < Cost (Pi) then 

9 I NewPopulation ^ Si] 

10 else 

11 I NewPopulation <— P^; 

12 end 

13 end 

14 Population ^ NewPopulation; 

15 EvaluatePopulation(Population) ; 

16 Sbest ^ GetBestSolution(Population) ; 

17 end 

18 return Sbest] 



• the crossover weight CP G [0, 1] probabihstically controls the amount 
of recombination, a value of 0.9 is suggested. 

• The initial population of candidate solutions should be randomly 
generated from within the space of valid solutions. 

• The popular configurations are DE/rand/1/* and DE/best/2/*. 



3.5.5 Code Listing 

Listing 3.4 provides an example of the Differential Evolution algorithm 
implemented in the Ruby Programming Language. The demonstration 
problem is an instance of a continuous function optimization that seeks 
min/(x) where / = "^^^ixj, —5.0 < Xi < 5.0 and n = 3. The optimal 
solution for this basin function is {vq, . . . ,Vn~i) = 0.0. The algorithm is an 
implementation of Differential Evolution with the DE/rand/l/bin configu- 
ration proposed by Storn and Price [9]. 
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Algorithm 3.5.2: Pseudocode for the NewSample function. 
Input: Pq, Population, NP, F, CR 
Output: S 

1 repeat 

2 I Pi ■<— RandomMemberC Population); 

3 until Pi 7^ Po ; 

4 repeat 

5 I P2 ^ RandomMember (Population); 

6 until P2 ^ Po V P2 ^ Pi ; 

7 repeat 

8 I P3 ^ RandomMember (Population) ; 

9 until P3 / Po V Ps / Pi V P3 / P2 ; 

10 CutPoint RandomPosition(NP) ; 

11 S ^0: 

12 for i to NP do 



13 
14 
15 
16 
17 

18 end 

19 return S 



if i= CutPoint A RandO < CR then 

I 5, ^ P3, + F X (Pi,- P2J; 
else 
I Si <r- Pq.; 
end 



def objective_function(vector) 

return vector, inject (0.0) {|siim, x| sum + (x ** 2.0)} 
end 

def randoin_vector (minmax) 

return Array. new (minmax. size) do |i| 

minmeix [i] [0] + ((minmajc[i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def de_rand_l_bin(pO, pi, p2, p3, f, cr, search_space) 
sample = {: vector=>Array .new(pO [: vector] . size)} 
cut = rand(sample[: vector] .size- 1) + 1 
sample [: vector] . each_index do |i| 
sample [: vector] [i] = pO [: vector] [i] 
if (i==cut or randO < cr) 

V = p3 [: vector] [i] + f * (pi [: vector] [i] - p2 [: vector] [i] ) 

V = search_space [i] [0] if v < search_space [i] [0] 

V = search_space [i] [1] if v > sesurch.space [i] [1] 
sample [: vector] [i] = v 

end 
end 

return sample 
end 
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def select_parents (pop, current) 

pi, p2, p3 = randCpop. size) , rand (pop. size) , rand (pop. size) 

pi = rand (pop. size) until pi != current 

p2 = randCpop. size) until p2 !- current and p2 !- pi 

p3 = rand (pop. size) until p3 != current and p3 != pi and p3 != p2 

return [pl,p2,p3] 

end 



def create_children(pop, minmax, f, cr) 

children = [] 

pop. each. with_index do |pO, i| 

pi, p2, p3 = select_parents (pop, i) 

children « de_rand_l_bin(pO, pop[pl], pop[p2], pop[p3], f, cr, minmax) 
end 

return children 
end 



def select .population (parents, children) 

return Array. new (parents. size) do |i| 

(children [i] [: cost] <=parents [i] [: cost] ) ? children[i] : parents[i] 

end 
end 



def search (max_gens, search_space , pop_size, f, cr) 

pop = Array .new (pop_size) {|i| {: vector=>random_ vector (search_space)]-} 
pop . each-[ I c I c[:cost] = objective_f unction(c [: vector] )} 
best = pop. sort-[ I x,y I x[:cost] <=> y[:cost]}.f irst 
max_gens . times do |gen| 

children = create_children(pop, search.space, f , cr) 

children. each{ I c I c[:cost] = objective_function(c [: vector] )} 

pop = select_population(pop , children) 

pop.sort K|x,y I x[:cost] <=> y[:cost]]- 

best = pop. first if pop. first [: cost] < best [: cost] 

puts " > gen #-Cgen+l}, f itness=#{best [: cost]}" 
end 

return best 
end 



if __FILE__ == $0 

# problem configuration 
problem_size = 3 

search_space = Array. new (problem_size) {|i| [-5, +5]]- 

# algorithm configuration 
max_gens = 200 

pop_size = 10*problem_size 
weightf =0.8 
crossf = 0.9 

# execute the algorithm 

best = search (max_gens , search_space , pop_size, weightf, crossf) 
puts "done! Solution: f =#{best [: cost] }, s=#{best [: vector] . inspect}" 
end 



Listing 3.4: Differential Evolution in Ruby 
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3.5.6 References 

Primary Sources 

The Differential Evolution algorithm was presented by Storn and Price in 
a technical report that considered DEI and DE2 variants of the approach 
applied to a suite of continuous function optimization problems [7]. An early 
paper by Storn applied the approach to the optimization of an IIR-filter 
(hifinite hnpulse Response) [5]. A second early paper applied the approach to 
a second suite of benchmark problem instances, adopting the contemporary 
nomenclature for describing the approach, including the DE/rand/1/* and 
DE/best/2/* variations [8]. The early work including technical reports and 
conference papers by Storn and Price culminated in a seminal journal article 
[9]. 

Learn More 

A classical overview of Differential Evolution was presented by Price and 
Storn [2], and terse introduction to the approach for function optimization 
is presented by Storn [6]. A seminal extended description of the algorithm 
with sample applications was presented by Storn and Price as a book chapter 
[3]. Price, Storn, and Lampinen released a contemporary book dedicated 
to Differential Evolution including theory, benchmarks, sample code, and 
numerous application demonstrations [4]. Chakraborty also released a book 
considering extensions to address complexities such as rotation invariance 
and stopping criteria [1]. 
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3.6 Evolutionary Programming 

Evolutionary Programming, EP. 

3.6.1 Taxonomy 

Evolutionary Programming is a Global Optimization algorithm and is 
an instance of an Evolutionary Algorithm from the field of Evolutionary 
Computation. The approach is a sibling of other Evolutionary Algorithms 
such as the Genetic Algorithm (Section 3.2), and Learning Classifier Systems 
(Section 3.9). It is sometimes confused with Genetic Programming given 
the similarity in name (Section 3.3), and more recently it shows a strong 
functional similarity to Evolution Strategies (Section 3.4). 

3.6.2 Inspiration 

Evolutionary Programming is inspired by the theory of evolution by means 
of natural selection. Specifically, the technique is inspired by macro-level 
or the species-level process of evolution (phenotype, hereditary, variation) 
and is not concerned with the genetic mechanisms of evolution (genome, 
chromosomes, genes, alleles). 

3.6.3 Metaphor 

A population of a species reproduce, creating progeny with small pheno- 
typical variation. The progeny and the parents compete based on their 
suitability to the environment, where the generally more fit members con- 
stitute the subsequent generation and are provided with the opportunity 
to reproduce themselves. This process repeats, improving the adaptive fit 
between the species and the environment. 

3.6.4 Strategy 

The objective of the Evolutionary Programming algorithm is to maximize the 
suitability of a collection of candidate solutions in the context of an objective 
function from the domain. This objective is pursued by using an adaptive 
model with surrogates for the processes of evolution, specifically hereditary 
(reproduction with variation) under competition. The representation used 
for candidate solutions is directly assessable by a cost or objective function 
from the domain. 

3.6.5 Procedure 

Algorithm 3.6.1 provides a pseudocode listing of the Evolutionary Program- 
ming algorithm for minimizing a cost function. 
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Algorithm 3.6.1: Pseudocode for Evolutionary Programming. 
Input: Population size, ProblemSize, BoutSize 

Output: Shest 

1 Population ^ InitializePopulationCPopuZationsi^e? ProblemSize); 

2 EvaluatePopulation(Population) ; 

3 ^best ^ GetBestSolution(Population); 

4 while -iStopConditionO do 

Children ^ 0; 

foreach Parenti G Population do 
Childi ^ Mutate (Parenti); 



Children Child, 



end 

EvaluatePopulation(Children) ; 
Sbest ^ GetBestSolution(Children, iSfcest); 
Union ^ Population + Children; 
foreach Si G Union do 
for 1 to BoutSize do 

Sj ^ RsLndomSelectionCUnion) ; 
if Cost(S'i) < Cost iSj) then 



Si, 



Si, 



"wins '-'"wvns 

end 
end 
end 

Population <— SelectBestByWins (Union, Populationsize)] 

22 end 

23 return S^est'-, 



3.6.6 Heuristics 

• The representation for candidate solutions should be domain specific, 
such as real numbers for continuous function optimization. 

• The sample size (bout size) for tournament selection during competi- 
tion is commonly between 5% and 10% of the population size. 

• Evolutionary Programming traditionally only uses the mutation opera- 
tor to create new candidate solutions from existing candidate solutions. 
The crossover operator that is used in some other Evolutionary Algo- 
rithms is not employed in Evolutionary Programming. 

• Evolutionary Programming is concerned with the linkage between par- 
ent and child candidate solutions and is not concerned with surrogates 
for genetic mechanisms. 
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• Continuous function optimization is a popular application for the 
approach, where real-valued representations are used with a Gaussian- 
based mutation operator. 

• The mutation-specific parameters used in the application of the algo- 
rithm to continuous function optimization can be adapted in concert 
with the candidate solutions [4]. 

3.6.7 Code Listing 

Listing 3.5 provides an example of the Evolutionary Programming algorithm 
implemented in the Ruby Programming Language. The demonstration 
problem is an instance of a continuous function optimization that seeks 
min/(a;) where / = Yl7=i-^i^ —5.0 < Xi < 5.0 and n = 2. The optimal 
solution for this basin function is (vq, . . . ,i'n-i) = 0.0. The algorithm is 
an implementation of Evolutionary Programming based on the classical 
implementation for continuous function optimization by Fogel et al. [4] with 
per- variable adaptive variance based on Fogel's description for a self-adaptive 
variation on page 160 of his 1995 book [3]. 

def objective_function(vector) 

return vector . inject (0 . 0) -[|sum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 

return Array . new(minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def random_gaussian(mean=0. 0, stdev=1.0) 
ul = u2 = w = 0 
begin 

ul = 2 * randO - 1 

u2 = 2 * randO - 1 

w = ul * ul + u2 * u2 
end while w >= 1 

w = Math. sqrt( (-2.0 * Math. log (w)) / w) 
return mean + (u2 * w) * stdev 
end 

def mutate (candidate , search_space) 
child = { : vector=> [] , : strategy=> [] > 
candidate [: vector] . each_with_index do |v_old, i| 
s_old = candidate [: strategy] [i] 
v = v_old + s_old * random_gaussian() 
v = search_space [i] [0] if v < search_space [i] [0] 
v = search_space [i] [1] if v > search_space [i] [1] 
child [: vector] << v 

child [: strategy] << s_old + random_gaussian() * s_old. abs**0. 5 
end 

return child 
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end 

def tournament (cEmdidate , population, bout_size) 
candidate [: wins] = 0 
bout _size. times do |i| 

other = population[rand(population. size)] 

candidate [: wins] += 1 if candidate [: fitness] < other [: fitness] 
end 
end 

def init_population(minmax, pop_size) 
strategy = Array. new(minmax. size) do |i| 

[0, (minmaxCi] [l]-minmax[i] [0]) * 0.05] 
end 

pop = Array .new(pop_size, {}) 
pop. each_ index do I i I 

pop [i] [: vector] = random_vector (minmax) 
pop[i] [: strategy] = random_ vector (strategy) 
end 

pop.each-CId c[: fitness] = obj ective_f unction (c[: vector] )} 
return pop 
end 

def search (max_gens, search_space , pop_size, bout.size) 

population = init_population(search_space , pop_size) 
population. each{ I c I c[:fitness] = objective_f unction(c [: vector] ) } 
best = population, sort ■[ I x,y I x[: fitness] <=> y[:fitness]}. first 
max_gens . times do |gen| 

children = Array. new(pop_size) {|i| mutate (populat ion [i] , search_space) } 
children. each{ I c I c[:fitness] = objective_f unction(c [: vector] )} 
children. sort K I x,y I x[:fitness] <=> y[: fitness] } 
best = children. first if children. first [: fitness] < best [: fitness] 
union = children+population 

union. each-[ I c I tournament (c, union, bout.size)} 

union. sort !{ I x,y I y[:wins] <=> x[:wins]} 
population = union . f irst (pop_size) 
puts " > gen #-[gen}, fitness=#{best [: fitness]}" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array. new(problem_size) {|i| [-5, +5]} 

# algorithm configuration 

max_gens = 200 
pop_size = 100 
bout_size = 5 

# execute the algorithm 

best = search (max_gens, search_space , pop_size, bout_size) 
puts "done! Solution: f =#-[best [: fitness] } , s=#-[best [: vector] . inspect}" 
end 



Listing 3.5: Evolutionary Programming in Ruby 
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3.6.8 References 

Primary Sources 

Evolutionary Programming was developed by Lawrence Fogel, outlined in 
early papers (such as [5]) and later became the focus of his PhD dissertation 
[6]. Fogel focused on the use of an evolutionary process for the development 
of control systems using Finite State Machine (FSM) representations. Fogel's 
early work on Evolutionary Programming culminated in a book (co-authored 
with Owens and Walsh) that elaborated the approach, focusing on the 
evolution of state machines for the prediction of symbols in time series data 
[9]. 

Learn More 

The field of Evolutionary Programming lay relatively dormant for 30 years 
until it was revived by Fogel's son, David. Early works considered the 
application of Evolutionary Programming to control systems [11], and 
later function optimization (system identification) culminating in a book 
on the approach [1], and David Fogel's PhD dissertation [2]. Lawrence 
Fogel collaborated in the revival of the technique, including reviews [7, 8] 
and extensions on what became the focus of the approach on function 
optimization [4]. 

Yao et al. provide a seminal study of Evolutionary Programming propos- 
ing an extension and racing it against the classical approach on a large 
number of test problems [12]. Finally, Porto provides an excellent contem- 
porary overview of the field and the technique [10]. 
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3.7 Grammatical Evolution 

Grammatical Evolution, GE. 

3.7.1 Taxonomy 

Grammatical Evolution is a Global Optimization technique and an instance 
of an Evolutionary Algorithm from the field of Evolutionary Computation. 
It may also be considered an algorithm for Automatic Programming. Gram- 
matical Evolution is related to other Evolutionary Algorithms for evolving 
programs such as Genetic Programming (Section 3.3) and Gene Expression 
Programming (Section 3.8), as well as the classical Genetic Algorithm that 
uses binary strings (Section 3.2). 

3.7.2 Inspiration 

The Grammatical Evolution algorithm is inspired by the biological process 
used for generating a protein from genetic material as well as the broader 
genetic evolutionary process. The genome is comprised of DNA as a string 
of building blocks that are transcribed to RNA. RNA codons are in turn 
translated into sequences of amino acids and used in the protein. The 
resulting protein in its environment is the phenotype. 

3.7.3 Metaphor 

The phenotype is a computer program that is created from a binary string- 
based genome. The genome is decoded into a sequence of integers that 
are in turn mapped onto pre-defined rules that makeup the program. The 
mapping from genotype to the phenotype is a one-to-many process that 
uses a wrapping feature. This is like the biological process observed in many 
bacteria, viruses, and mitochondria, where the same genetic material is used 
in the expression of different genes. The mapping adds robustness to the 
process both in the ability to adopt structure- agnostic genetic operators used 
during the evolutionary process on the sub-symbolic representation and the 
transcription of well-formed executable programs from the representation. 

3.7.4 Strategy 

The objective of Grammatical Evolution is to adapt an executable program 
to a problem specific objective function. This is achieved through an iterative 
process with surrogates of evolutionary mechanisms such as descent with 
variation, genetic mutation and recombination, and genetic transcription 
and gene expression. A population of programs are evolved in a sub- 
symbolic form as variable length binary strings and mapped to a symbolic 
and well-structured form as a context free grammar for execution. 
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3.7.5 Procedure 

A grammar is defined in Backus Normal Form (BNF), which is a context free 
grammar expressed as a series of production rules comprised of terminals 
and non-terminals. A variable-length binary string representation is used 
for the optimization process. Bits are read from the a candidate solutions 
genome in blocks of 8 called a codon, and decoded to an integer (in the 
range between 0 and 2^ — 1). If the end of the binary string is reached 
when reading integers, the reading process loops back to the start of the 
string, effectively creating a circular genome. The integers are mapped to 
expressions from the BNF until a complete syntactically correct expression 
is formed. This may not use a solutions entire genome, or use the decoded 
genome more than once given it's circular nature. Algorithm 3.7.1 provides 
a pseudocode listing of the Grammatical Evolution algorithm for minimizing 
a cost function. 

3.7.6 Heuristics 

• Grammatical Evolution was designed to optimize programs (such as 
mathematical equations) to specific cost functions. 

• Classical genetic operators used by the Genetic Algorithm may be 
used in the Grammatical Evolution algorithm, such as point mutations 
and one-point crossover. 

• Codons (groups of bits mapped to an integer) are commonly fixed at 
8 bits, proving a range of integers G [0, 2^ — 1] that is scaled to the 
range of rules using a modulo function. 

• Additional genetic operators may be used with variable-length rep- 
resentations such as codon segments, duplication (add to the end), 
number of codons selected at random, and deletion. 

3.7.7 Code Listing 

Listing 3.6 provides an example of the Grammatical Evolution algorithm 
implemented in the Ruby Programming Language based on the version 
described by O'Neill and Ryan [5]. The demonstration problem is an 
instance of symbolic regression f{x) = x"^ -\- -\- -\- x , where x G [1, 10]. 
The grammar used in this problem is: 

• Non-terminals: N = {expr,op,pre_op} 

• Terminals: T ={+,—,-=-, x, x, 1.0} 

• Expression (program): S =<expr> 

The production rules for the grammar in BNF are: 
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Algorithm 3.7.1: Pseudocode for Grammatical Evolution. 
Input: Grammar, Codon-numbits, Populatiorisize, Pcrossover, 

Pmutation 5 Pd eletet Pduplicate 
Output: Sbest 

1 Population Init ializePopulat ion (PopwZatiorisi^ej 

C odorinumbits ) 5 

2 foreach Si G Population do 

3 
4 



Siintegers 

<r- Decode (Sibitstring, C odonnumhits) \ 



Siprogram integer s ^ Grammar) j 

6 end 

7 Shest ^ GetBestSolution(Population) ; 

8 while -iStopConditionO do 



9 
10 
11 
12 

13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 



Parents ^ SelectParents (Population, Populationsize)\ 
Children ^ 0; 

foreach Parenti, Parentj G Parents do 

Si -f- Crossover (Parentj, Parentj, Pcrossover)', 
Sibitstring ^ CodohDeletioniSibitstring , Pdelete ) i 

CodonDuplicat ion (5ibitstT-ing; Pduplicate)] 
Mutate (.Sibitstring > Pmutation ) j 



Sib itstring 
Sibitstring 

Children -f 
end 

foreach Si G Children do 



Si 



I) 



S^integers 
Sij 



^ Decode iSibitstring ) C odorinumhits) \ 
''program ^ ^^-P ("^^mfeflfersj Grammar), 

Sicost ^ Execute (iS'ipj'ogj-inTj,) 5 



end 



Sbest ^ GetBestSolution(Children) ; 
Population ^ Replace (Population, Children); 

25 end 

26 return Sbest', 



• <expr> ::= <expr><op><expr> , (<expr><op><expr>) , <pre_op>(<expr>) , 
<var> 

• <op> ::= +, X 

• <var> ::= x, 1.0 

The algorithm uses point mutation and a codon-respecting one-point 
crossover operator. Binary tournament selection is used to determine 
the parent population's contribution to the subsequent generation. Binary 
strings are decoded to integers using an unsigned binary. Candidate solutions 
are then mapped directly into executable Ruby code and executed. A given 



Copyrighted material 



3. 7. Grammatical Evolution 



129 



candidate solution is evaluated by comparing its output against the target 
function and taking the sum of the absolute errors over a number of trials. 

The probabilities of point mutation, codon deletion, and codon duplication 
are hard coded as relative probabilities to each solution, although should 
be parameters of the algorithm. In this case they are heuristically defined 
as ^ , jf^ and respectively, where L is the total number of bits, and 
NC is the number of codons in a given candidate solution. 

Solutions are evaluated by generating a number of random samples from 
the domain and calculating the mean error of the program to the expected 
outcome. Programs that contain a single term or those that return an 
invalid (NaN) or infinite result are penalized with an enormous error value. 
The implementation uses a maximum depth in the expression tree, whereas 
traditionally such deep expression trees are marked as invalid. Programs 
that resolve to a single expression that returns the output are penalized. 

def binary _tourncLment (pop) 

i, j = rand (pop. size) , rand (pop. size) 
j = reaid (pop. size) while j==i 

return (pop [i] [: fitness] < pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def point _mutation(bitstring, rate=1.0/bitstring.size.to_f ) 

child = "" 

bitstring. size. times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<rate) ? ((bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def one_point_crossover (parent 1, parent2, codon_bits, p_cross=0.30) 
return ""+parentl [: bitstring] if randO >=p_cross 
cut = rand ( [parent 1 . size, parent2. size] .min/codon_bits) 

cut *= codon_bits 

p2size = parent2 [: bitstring] . size 

return paurentl [: bitstring] [0. . . cut] +psLrent2[: bitstring] [cut. . .p2size] 
end 

def codon_duplication(bitstring, codon_bits, rate=l . 0/codon_bits. to_f ) 
return bitstring if randO >= rate 
codons = bitstring. size/codon_bits 

return bitstring + bitstring [rand(codons)*codon_bits, codon_bits] 
end 

def codon_deletion(bitstring, codon_bits, rate=0. 5/codon_bits.to_f ) 
return bitstring if randO >= rate 
codons = bitstring. size/codon_bits 
off = rand(codons)*codon_bits 

return bitstring [0. .. of f] + bitstring [of f+codon_bits. . .bitstring. size] 
end 

def reproduce (selected, pop.size, p_cross, codon_bits) 
children = [] 
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selected. each_with_index do I pi, i| 

p2 = (i.modulo(2)==0) ? selected[i+l] : selected[i-l] 
p2 = selected[0] if i == selected. size-1 
child = {} 

child [:bitstring] = one_point_crossover (pi , p2, codon_bits, p_cross) 

child [: bitstring] = codon_deletion(child[:bitstring] , codon_bits) 
child [:bitstring] = codon_duplication(child [: bitstring] , codon_bits) 
child [: bitstring] = point_mutation(child [: bitstring] ) 
children << child 

break if children. size == pop.size 
end 

return children 
end 

def random_bitstring(num_bits) 

return (0. . .num.bits) .inject(""){|s,i| s«((rand<0.5) ? "1" : "0")> 
end 

def decode.integers (bitstring, codon.bits) 
ints = [] 

(bitstring. size/codon_bits) . times do |off| 
codon = bitstring[of f *codon_bits, codon_bits] 
sum = 0 

codon. size . times do |i| 

sum += ((codon [i] .chr=='l') ? 1 : 0) * (2 ** i) ; 
end 

ints « sum 
end 

return ints 
end 

def map (grammar, integers, max_depth) 
done, offset, depth = false, 0, 0 
sjrmbolic_string = grammar ["S"] 
begin 

done = true 

grammar . keys . each do I key] 

symbolic_string = symbolic_string. gsub(key) do |k| 
done = false 

set = (k=="EXP" && depth>=max_depth-l) ? grammar ["VAR"] : grammar 
integer = integers [of f set] .modulo (set . size) 
offset = (off set==integers. size-1) ? 0 : offset+1 
set [integer] 
end 
end 

depth += 1 
end until done 
return symbolic_string 
end 

def target_f unction(x) 

return x**4.0 + x**3.0 + x**2.0 + x 
end 

def sample_from_bounds (bounds) 
return bounds [0] + ( (bounds [1] - bounds [0]) * randO) 
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end 

def cost (program, bounds, num_trials=30) 
return 9999999 if program. strip == "INPUT" 
svim_error = 0.0 

num_trials . times do 

X = sample_from_bounds (bounds) 

expression = program. gsub(" INPUT", x.to_s) 

begin score = eval (expression) rescue score = 0.0/0.0 end 

return 9999999 if score. nan? or score . infinite? 

sum_error += (score - target_f unction (x) ). abs 
end 

return sum_error / num_trials . to_f 
end 

def evaluate (candidate, codon_bits, grammar, max_depth, bounds) 

candidate [: integers] = decode_integers (candidate [: bitstring] , codon_bits) 
candidate [: program] = map(grammar, candidate [: integers] , max_depth) 
candidate [: fitness] = cost (candidate [: program] , bounds) 
end 

def search (max_gens, pop_size, codon_bits, num_bits, p_cross, grammar, 
max_depth, bounds) 

pop = Array . new(pop_size) i\±\ { :bitstring=>random_bitstring(num_bits)}} 
pop.each{|c| evaluate (c , codon_bits , grammar, max_depth, bounds)} 
best = pop. sort{ I x,y I x[:fitness] <=> y [: fitness] }. first 
max_gens. times do I gen I 

selected = Array . new(pop_size) { | i | binary_tournament (pop) } 
children = reproduce (selected, pop_size, p_cross , codon_bits) 
children. each{ I c I evaluate(c, codon_bits, grammar, max_depth, bounds)} 
children. sort K I x,y I x[:fitness] <=> y[:fitness]} 

best = children. first if children . f irst [: f itness] <= best [ : f itness] 
pop=(children+pop) . sort{ I x,y | x [ : fitness] <=>y [: fitness] } .first (pop_size) 
puts " > gen=#-Cgen}, f=#{best[: fitness]}, s=#-[best [: bitstring] }" 
break if best [: fitness] ==0.0 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
grammar = {"S"=>"EXP", 

"EXP"=>[" EXP BINARY EXP ", " (EXP BINARY EXP) ", " VAR "] , 
"BINARY"=>["+", "/", "*" ], 

"VAR"=> ["INPUT", "1.0"]} 
bounds = [1, 10] 

# algorithm configuration 
max_depth = 7 
max_gens = 50 

pop_size = 100 
codon_bits = 4 
num_bits = 10*codon_bits 
p_cross = 0.30 

# execute the algorithm 

best = search (max_gens, pop_size, codon_bits, num_bits, p_cross, grammar, 
max_depth, bounds) 
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149 puts "done! Solution: f =#{best [: fitness] } , s=#{best [: program] }-' 

150 end 

Listing 3.6: Grammatical Evolution in Ruby 



3.7.8 References 
Primary Sources 

Grammatical Evolution was proposed by Ryan, Collins and O'Neill in a 
seminal conference paper that applied the approach to a symbolic regression 
problem [7]. The approach was born out of the desire for syntax preservation 
while evolving programs using the Genetic Programming algorithm. This 
seminal work was followed by application papers for a symbolic integration 
problem [2, 3] and solving trigonometric identities [8]. 

Learn More 

O'Neill and Ryan provide a high-level introduction to Grammatical Evolu- 
tion and early demonstration applications [4]. The same authors provide 
a thorough introduction to the technique and overview of the state of the 
field [5]. O'Neill and Ryan present a seminal reference for Grammatical 
Evolution in their book [6] . A second more recent book considers extensions 
to the approach improving its capability on dynamic problems [1]. 
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3.8 Gene Expression Programming 

Gene Expression Programming, GEP. 

3.8.1 Taxonomy 

Gene Expression Programming is a Global Optimization algorithm and an 
Automatic Programming technique, and it is an instance of an Evolution- 
ary Algorithm from the field of Evolutionary Computation. It is a sibling 
of other Evolutionary Algorithms such as a the Genetic Algorithm (Sec- 
tion 3.2) as well as other Evolutionary Automatic Programming techniques 
such as Genetic Programming (Section 3.3) and Grammatical Evolution 
(Section 3.7). 

3.8.2 Inspiration 

Gene Expression Programming is inspired by the replication and expression 
of the DNA molecule, specifically at the gene level. The expression of a 
gene involves the transcription of its DNA to RNA which in turn forms 
amino acids that make up proteins in the phenotype of an organism. The 
DNA building blocks are subjected to mechanisms of variation (mutations 
such as coping errors) as well as recombination during sexual reproduction. 

3.8.3 Metaphor 

Gene Expression Programming uses a linear genome as the basis for genetic 
operators such as mutation, recombination, inversion, and transposition. 
The genome is comprised of chromosomes and each chromosome is comprised 
of genes that are translated into an expression tree to solve a given problem. 
The robust gene definition means that genetic operators can be applied to 
the sub-symbolic representation without concern for the structure of the 
resultant gene expression, providing separation of genotype and phenotype. 

3.8.4 Strategy 

The objective of the Gene Expression Programming algorithm is to im- 
prove the adaptive fit of an expressed program in the context of a problem 
specific cost function. This is achieved through the use of an evolutionary 
process that operates on a sub-symbolic representation of candidate solu- 
tions using surrogates for the processes (descent with modification) and 
mechanisms (genetic recombination, mutation, inversion, transposition, and 
gene expression) of evolution. 
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3.8.5 Procedure 

A candidate solution is represented as a linear string of symbols called 
Karva notation or a K-expression, where each symbol maps to a function or 
terminal node. The linear representation is mapped to an expression tree in 
a breadth- first manner. A K-expression has fixed length and is comprised 
of one or more sub-expressions (genes), which are also defined with a fixed 
length. A gene is comprised of two sections, a head which may contain 
any function or terminal symbols, and a tail section that may only contain 
terminal symbols. Each gene will always translate to a syntactically correct 
expression tree, where the tail portion of the gene provides a genetic buffer 
which ensures closure of the expression. 

Algorithm 3.8.1 provides a pseudocode listing of the Gene Expression 
Programming algorithm for minimizing a cost function. 



Algorithm 3.8.1: Pseudocode for GEP. 



Input: G rammar. Population gj^^^^ H eadi^jigi}^^ T ailiQ^giii^ ^crossover : 

P mutation 
Output: Sbest 

1 Population ^ InitializePopulationCPopw/ationsi^e? Grammar, 

Headlength, Taillength)] 

2 foreach Si G Population do 

3 
4 



Siprogram ^ DecodeBreadthFir st (S'igenome 7 Grammar); 
Sicost Execute (>S'?p7.o(j7-a77^ ) , 

5 end 

6 Siest ^ GetBestSolution(Population) ; 

7 while -iStopConditionO do 
Parents ^ SelectParents (Population, Population size)] 
Children ^ 0; 

foreach Parenti, Parent2 G Parents do 

Sigenome ^ CrOSSOVBT (Parenti , Parent2, Pcrossover^] 
Sigenome MU-tate (iSiggj^oj^jg , Pmutation^ 1 

Children 5^; 
end 

foreach Si G Children do 

Siprogram ^ DecodeBreadthFir st (5'ic,enome7 Grammar); 
Sicost Execute (jSipT-ograjyj, ) , 
end 

Population ^ Replace (Population, Children); 
'^best ^ GetBestSolution(Children) ; 

21 end 

22 return Sbest] 



8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
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3.8.6 Heuristics 

• The length of a chromosome is defined by the number of genes, where 
a gene length is defined by h -\- 1. The h is a user defined parameter 
(such as 10), and t is defined as t = h{n— 1) + 1, where the n represents 
the maximum arity of functional nodes in the expression (such as 2 if 
the arithmetic functions x, + are used). 

• The mutation operator substitutes expressions along the genome, 
although must respect the gene rules such that function and terminal 
nodes are mutated in the head of genes, whereas only terminal nodes 
are substituted in the tail of genes. 

• Crossover occurs between two selected parents from the population 
and can occur based on a one-point cross, two point cross, or a gene- 
based approach where genes are selected from the parents with uniform 
probability. 

• An inversion operator may be used with a low probability that reverses 
a small sequence of symbols (1-3) within a section of a gene (tail or 
head) . 

• A transposition operator may be used that has a number of different 
modes, including: duplicate a small sequences (1-3) from somewhere 
on a gene to the head, small sequences on a gene to the root of the 
gene, and moving of entire genes in the chromosome. In the case 
of intra- gene transpositions, the sequence in the head of the gene is 
moved down to accommodate the copied sequence and the length of 
the head is truncated to maintain consistent gene sizes. 

• A '?' may be included in the terminal set that represents a numeric 
constant from an array that is e^'olved on the end of the genome. The 
constants are read from the end of the genome and are substituted for 
'?' as the expression tree is created (in breadth first order). Finally the 
numeric constants are used as array indices in yet another chromosome 
of numerical values which are substituted into the expression tree. 

• Mutation is low (such as j^), selection can be any of the classical 
approaches (such as roulette wheel or tournament), and crossover 
rates are typically high (0.7 of oflFspring) 

• Use multiple sub- expressions linked together on hard problems when 
one gene is not sufficient to address the problem. The sub-expressions 
are linked using link expressions which are function nodes that are 
either statically defined (such as a conjunction) or evolved on the 
genome with the genes. 
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3.8.7 Code Listing 

Listing 3.7 provides an example of the Gene Expression Programming 
algorithm implemented in the Ruby Programming Language based on the 
seminal version proposed by Ferreira [1]. The demonstration problem is an 
instance of symbolic regression f{x) — x"^ -\- -\- -\- x, where x E [1, 10]. 
The grammar used in this problem is: Functions: F = {+, x, } and 

Terminals: T = {x}. 

The algorithm uses binary tournament selection, uniform crossover 
and point mutations. The K-expression is decoded to an expression tree 
in a breadth-first manner, which is then parsed depth first as a Ruby 
expression string for display and direct evaluation. Solutions are evaluated 
by generating a number of random samples from the domain and calculating 
the mean error of the program to the expected outcome. Programs that 
contain a single term or those that return an invalid (NaN) or infinite result 
are penalized with an enormous error value. 

def binary_tournament (pop) 

i, j = randCpop. size) , rand(pop. size) 

return (pop [i] [: fitness] < pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def point_mutat ion (grammar , genome, head_length, rate=l . 0/genome . size . to_f ) 
child ="" 

genome . size . times do |i| 
bit = genome [i] . chr 
if randO < rate 
if i < head_length 

selection = (randO < 0.5) ? grammar ["FUNC"] : grammar ["TERM"] 
bit = selection[rand(selection. size)] 
else 

bit = grammar ["TERM"] [rand(grammar ["TERM"] . size) ] 
end 
end 

child << bit 
end 

return child 
end 

def crossover (parent 1 , parent2, rate) 
return ""+parentl if rand()>=rate 
child = "" 

parentl . size . times do |i| 

child << ((rand()<0.5) ? parentl [i] : parent2[i]) 
end 

return child 
end 

def reproduce (grammar , selected, pop_size, p_crossover, head_length) 
children = [] 

selected. each_with_index do I pi, i| 

p2 = (i .modulo(2)==0) ? selected [i+1] : selected [i-1] 
p2 = selected[0] if i == selected. size-1 
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child = {} 

child [: genome] = crossover (pi [: genome] , p2[: genome], p_crossover) 
child [: genome] = point _mutation(grammar, child [: genome] , head_length) 
children « child 
end 

return children 
end 

def random_genome (grammar, head_length, tail_length) 

s = "" 

head_length. times do 

selection = (randO < 0.5) ? grammar ["FUNG"] : grammar ["TERM"] 

s « selection[rand(selection. size)] 
end 

tail_length. times { s « grammar ["TERM"] [rand (grammar ["TERM"] . size) ] } 
return s 
end 

def target_f unction (x) 

return x**4.0 + x**3.0 + x**2.0 + x 
end 

def sample_from_bounds (bounds) 

return bounds [0] + ( (bounds [1] - bounds [0]) * randO) 
end 

def cost (program, bounds, num_trials=30) 

errors = 0.0 
num_trials . times do 

X = sample_from_bounds (bounds) 

expression, score = program. gsub ( "x" , x.to_s), 0.0 
begin score = eval (expression) rescue score = 0.0/0.0 end 
return 9999999 if score. nan? or score . infinite? 
errors += (score - target_function(x) ) . abs 
end 

return errors / num_trials . to_f 
end 

def mapping (genome , grammar) 
off, queue =0, [] 
root = {} 

root [mode] = genome [of f] . chr; off+=l 

queue .push (root) 
while ! queue . empty? do 
current = queue . shift 

if grammar ["FUNG"] . include? (current [: node] ) 

current [: left] = {} 

current [: left] [mode] = genome [of f] . chr ; off+=l 
queue . push (current [ : left] ) 
current [: right] = {} 

current [: right] [ mode] = genome [of f] . chr ; off+=l 
queue. push (current [: right] ) 
end 
end 

return root 
end 
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def tree_to_string(exp) 

return exp[:node] if (exp [: left] .nil? or exp[: right] .nil?) 

left = tree_to_string(exp[:left] ) 

right = tree_to_string(exp [: right] ) 

return "(#{left} #{exp[:node]} #{right})" 
end 

def evaluate (candidate , grammar, bounds) 

candidate [: expression] = mapping (candidate [: genome] , grammar) 
candidate [: program] = tree_to_string (candidate [: expression] ) 
candidate [: fitness] = cost (candidate [: program] , bounds) 

end 

def search (grammar , bounds, h_length, t_length, max_gens, pop_size, p_cross) 
pop = Array .new(pop_size) do 

-[:genome=>random_genome(graiimi2Lr, h.length, t_length)} 
end 

pop . each-C | c | evaluate (c , grammar , bounds) } 

best = pop. sort{ I x,y I x[:fitness] <=> y[: fitness]}. first 

max_gens . times do I gen I 

selected = Array.new(pop)-[| i| binary_tournament (pop)} 

children = reproduce (grammar, selected, pop_size, p.cross, h.length) 

children. each{ I c I evaluate (c, grammar, bounds)} 

children. sort K I x,y I x[:fitness] <=> y[:fitness]} 

best = children. first if children. first [: fitness] <= best [: fitness] 
pop = (children+pop) . first (pop_size) 

puts " > gen=#-Cgen}, f=#-[best[:f itness]}, g=#{best [: genome]}" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

grammar = {"FUNC"=> ["+","-","*","/"] , "TERM"=> ["x"]} 

bounds = [1.0, 10.0] 

# algorithm configuration 
h_ length = 20 

t_length = h_length * (2-1) + 1 
max_gens = 150 
pop_size = 80 
p_cross = 0.85 

# execute the algorithm 

best = seeirch (grammar , bounds, h_length, t_length, max_gens, pop_size, 
p_cross) 

puts "done! Solution: f=#{best [: fitness]}, program=#{best[: program]}" 
end 



Listing 3.7: Gene Expression Programming in Ruby 



Copyrighted material 



140 



Chapter 3. Evolutionary Algorithms 



3.8.8 References 

Primary Sources 

The Gene Expression Programming algorithm was proposed by Ferreira in 
a paper that detailed the approach, provided a careful walkthrough of the 
process and operators, and demonstrated the the algorithm on a number of 
benchmark problem instances including symbolic regression [1]. 

Learn More 

Ferreira provided an early and detailed introduction and overview of the 
approach as book chapter, providing a step-by-step walkthrough of the 
procedure and sample applications [2]. A more contemporary and detailed 
introduction is provided in a later book chapter [3]. Ferreira published a 
book on the approach in 2002 covering background, the algorithm, and 
demonstration applications which is now in its second edition [4]. 

3.8.9 Bibliography 
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3.9 Learning Classifier System 

Learning Classifier System, LCS. 

3.9.1 Taxonomy 

The Learning Classifier System algorithm is both an instance of an Evo- 
lutionary Algorithm from the field of Evolutionary Computation and an 
instance of a Reinforcement Learning algorithm from Machine Learning. 
Internally, Learning Classifier Systems make use of a Genetic Algorithm 
(Section 3.2). The Learning Classifier System is a theoretical system with a 
number of implementations. The two main approaches to implementing and 
investigating the system empirically are the Pittsburgh-style that seeks to 
optimize the whole classifier, and the Michigan-style that optimize respon- 
sive rulesets. The Michigan-style Learning Classifier is the most common 
and is comprised of two versions: the ZCS (zeroth-level classifier system) 
and the XCS (accuracy-based classifier system). 

3.9.2 Strategy 

The objective of the Learning Classifier System algorithm is to optimize 
payoff based on exposure to stimuli from a problem-specific environment. 
This is achieved by managing credit assignment for those rules that prove 
useful and searching for new rules and new variations on existing rules using 
an evolutionary process. 

3.9.3 Procedure 

The actors of the system include detectors, messages, effectors, feedback, 
and classifiers. Detectors are used by the system to perceive the state of the 
environment. Messages are the discrete information packets passed from the 
detectors into the system. The system performs information processing on 
messages, and messages may directly result in actions in the environment. 
Effectors control the actions of the system on and within the environment, 
hi addition to the system actively perceiving via its detections, it may 
also receive directed feedback from the environment (payoff). Classifiers 
are condition- action rules that provide a filter for messages. If a message 
satisfies the conditional part of the classifier, the action of the classifier 
triggers. Rules act as message processors. Message a fixed length bitstring. 
A classifier is defined as a ternary string with an alphabet G {1, 0, 7^}, where 
the 7^ represents do not care (matching either 1 or 0). 

The processing loop for the Learning Classifier system is as follows: 

1. Messages from the environment are placed on the message list. 
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2. The conditions of each classifier are checked to see if they are satisfied 
by at least one message in the message list. 

3. All classifiers that are satisfied participate in a competition, those 
that win post their action to the message list. 

4. All messages directed to the effectors are executed (causing actions in 
the environment). 

5. All messages on the message list from the previous cycle are deleted 
(messages persist for a single cycle). 



The algorithm may be described in terms of the main processing loop and 
two sub-algorithms: a reinforcement learning algorithm such as the bucket 
brigade algorithm or Q-learning, and a genetic algorithm for optimization of 
the system. Algorithm 3.9.1 provides a pseudocode listing of the high-level 
processing loop of the Learning Classifier System, specifically the XCS as 
described by Butz and Wilson [3]. 



3.9.4 Heuristics 

The majority of the heuristics in this section are specific to the XCS Learning 
Classifier System as described by Butz and Wilson [3]. 

• Learning Classifier Systems are suited for problems with the following 
characteristics: perpetually novel events with significant noise, contin- 
ual real-time requirements for action, implicitly or inexactly defined 
goals, and sparse payoff or reinforcement obtainable only through long 
sequences of tasks. 

• The learning rate ^ for a classifier's expected payoff, error, and fitness 
are typically in the range [0.1, 0.2]. 

• The frequency of running the genetic algorithm Oqa should be in the 
range [25, 50]. 

• The discount factor used in multi-step programs 7 are typically in the 
around 0.71. 

• The minimum error whereby classifiers are considered to have equal 
accuracy eg is typically 10% of the maximum reward. 

• The probability of crossover in the genetic algorithm x is typically in 
the range [0.5, 1.0]. 

• The probability of mutating a single position in a classifier in the 
genetic algorithm ^ is typically in the range [0.01,0.05]. 
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Algorithm 3.9.1: Pseudocode for the LCS. 



Input: EnvironmentDetails 
Output: Population 

1 env <— InitializeEnviroiment (EnvironmentDetails); 

2 Population InitializePopulationO; 

3 ActionSett_i <— 0; 

4 Inputt-i <— 0; 

5 Rewardt-i <— 0; 

6 while -iStopConditionO do 

Inputf ^ env; 

Matchset GenerateMatchSet (Population, Inputt); 
Prediction GeneratePrediction( Matchset) ; 
Action SelectionAction(Prediction) ; 
ActionSett ^ GenerateActionSet (Action, Matchset); 
Rewardt ExecuteAction(Action, env); 
if Actions ett-i ^ 0 then 

Payofft -i— CalculatePayof f (i2etyar<it_i, Prediction); 
Peif OTmLeaxningi Actions ett-i, Payofft, Population); 
RwoGeneticklgoTlthnii Actions ett-i, Inputt-i, Population); 
end 

if LastStepOf Task (env, Action) then 

Payofft Rewardt; 

PerformLearning (ActionS'et^, Payofft, Population); 
RunGeneticklgoritlmi ActionSett, Inputt, Population); 
ActionSett-i 0; 
else 

ActionSett-i ^ ActionSett; 
Inputt-i Inputt] 
Rewardt-i ^ Reward^, 
end 



28 end 



• The experience threshold during classifier deletion 9 del is typically 
about 20. 

• The experience threshold for a classifier during subsumption 9sub is 
typically around 20. 

• The initial values for a classifier's expected payoff pi , error ei , and 
fitness /i are typically small and close to zero. 

• The probability of selecting a random action for the purposes of 
exploration p^^p is typically close to 0.5. 
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• The minimum nmnber of different actions that must be specified in a 
match set 9mna is usually the total number of possible actions in the 
environment for the input. 

• Subsumption should be used on problem domains that are known 
contain well defined rules for mapping inputs to outputs. 

3.9.5 Code Listing 

Listing 3.8 provides an example of the Learning Classifier System algorithm 
implemented in the Ruby Programming Language. The problem is an 
instance of a Boolean multiplexer called the 6- multiplexer. It can be 
described as a classification problem, where each of the 2® patterns of bits 
is associated with a boolean class G {1,0}. For this problem instance, the 
first two bits may be decoded as an address into the remaining four bits 
that specify the class (for example in 100011, '10' decode to the index of 
'2' in the remaining 4 bits making the class '1'). hi propositional logic this 
problem instance may be described a.s F = (~ia;o)(~'a:i)x2 + {^xq)xix^ + 
xq{-'Xi)x4 + xqXix^. The algorithm is an instance of XCS based on the 
description provided by Butz and Wilson [3] with the parameters based 
on the application of XCS to Boolean multiplexer problems by Wilson 
[14, 15]. The population is grown as needed, and subsumption which would 
be appropriate for the Boolean multiplexer problem was not used for brevity. 
The multiplexer problem is a single step problem, so the complexities of 
delayed payoff are not required. A number of parameters were hard coded to 
recommended values, specifically: a = 0.1, v = —0.5, 5 = 0.1 and = ^. 

def neg(bit) 

return (bit==l) ? 0 : 1 
end 

def target_f unction(s) 

ints = Array . new (6) -[ I i I s [i] . chr . to_i}- 
xO, xl ,x2 , x3 , x4, x5 = ints 

return neg(xO) *neg(xl) *x2 + neg(xO) *xl*x3 + xO*neg(xl) *x4 + x0*xl*x5 
end 

def new_classifier (condition, action, gen, pl=10.0, el=0.0, fl=10.0) 
other = {}■ 

other [: condition] , other [: action] , other [: lasttime] = condition, action, gen 
other [: pred] , other [: error] , other [: f itness] = pi, el, fl 
other [:exp], other [: setsize] , other [: num] =0.0, 1.0, 1.0 
return other 
end 

def copy_classifier (parent) 
copy = {} 

parent . keys . each do |k| 

copy[k] = (parent [k] . kind_of? String) ? ""+parent[k] : parent [k] 
end 
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copy [ : num] , copy [ : exp] = 1.0, 0.0 
return copy 
end 

def raiidom_bitstring(size=6) 

return (0. . .size) .inject(""){|s,i| s+((rand<0.5) ? "1" : "0")} 
end 

def calculate_deletion_vote(classif ier , pop, del_thresh, f _thresh=0. 1) 
vote = classifier [: setsize] * classifier [: num] 
total = pop . inject (0 . 0) { I s, c I s+c[:nuin]} 

avg_fitness = pop. inject (0. 0)-[ I s,c I s + (c [: fitness] /total)} 

derated = classifier [rfitness] / classifier [: num] . to_f 

if classifier [: exp] >del_thresh and derated< (f _thresh*avg_f itness) 

return vote * (avg_fitness / derated) 
end 

return vote 
end 

def delete_f rom_pop(pop, pop_size, del_thresh=20. 0) 
total = pop. inject (0) -flsjcl s+c[:num])- 
return if total <= pop_size 

pop. each {|c| c[:dvote] = calculate_deletion_vote(c, pop, del_thresh)} 
vote_stmi = pop . inj ect (0. 0) •[|s,c| s+c[:dvote]} 

point = randO * vote_sum 
vote_suiii, index = 0.0, 0 
pop. each_with_ index do |c,i| 
vote_suin += c[:dvote] 
if vote_sum >= point 
index = i 
break 
end 
end 

if pop [index] [:nuin] > 1 
pop [index] [: num] -= 1 
else 

pop.delete_at (index) 
end 
end 

def generate_random_classif ier (input , actions, gen, rate=l. 0/3.0) 
condition = 

input . size . times {|i| condition << ((rand<rate) ? '#' : input [i] . chr) } 
action = actions [rand(actions . size)] 
return new.classif ier (condition, action, gen) 
end 

def does_match? (input , condition) 
input. size. times do |i| 

return false if condition[i] .chr !='#' and input [i] .chr! =condit ion [i] .chr 
end 

return true 
end 

def get_actions (pop) 
actions = [] 
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pop . each do | c | 

actions « c[: action] if tactions. include? (c[: action]) 
end 

return actions 
end 

def generate_inatch_set (input , pop, all_actions, gen, pop_size) 
match_set = pop. select-[ I c I do es_match? (input, c [: condition] )} 

actions = get_actions (match_set) 
while actions. size < all_actions . size do 
remaining = all_actions - actions 

classifier = generate_random_classifier (input , remaining, gen) 

pop << classifier 
match_set << classifier 
delete_f roiii_pop(pop, pop_size) 
actions « classifier [: action] 
end 

return match_set 
end 

def generate_prediction(match_set) 
pred = {} 

match_set . each do I classifier I 

key = classifier [: action] 

pred [key] = -[ : sum=>0 . 0 , : count=>0 . 0, : weight=>0 . 0> if pred [key] .nil? 
pred [key] [: sum] += classifier [: pred] *classifier [: fitness] 
pred [key] [: count] += classifier[: fitness] 
end 

pred . keys . each do Ikeyl 

pred [key] [: weight] = 0.0 

if pred [key] [: count] > 0 
pred [key] [: weight] = pred [key] [: sum] /pred [key] [: count] 

end 
end 

return pred 
end 

def select_action (predictions, p_explore=f alse) 

keys = Array . newCpredictions . keys) 
return keys [rand (keys . size) ] if p_explore 

keys. sort ! { I x,y I predictions [y] [: weight] <=>predictions [x] [: weight]} 
return keys. first 
end 

def update_set(action_set, reward, beta=0.2) 

sum = action_set . inject (0. 0) {Is, other | s+other [ : num] } 
action_set . each do |c| 
c[:exp] +=1.0 
if c[:exp] < 1.0/beta 

c [ : error] = (c [ : error] * (c [ : exp] -1.0) + (reward-c [ : pred] ) . abs)/c [: exp] 
c[:pred] = (c[:pred] * (c [ : exp] -1 . 0) + reward) / c[:exp] 
c[:setsize] = (c [ : setsize] * (c [ : exp] -1 . 0)+sum) / c[:exp] 
else 

c[: error] += beta * ( (reward-c [: pred] ). abs - c[:error]) 

c[:pred] += beta * (reward-c [: pred] ) 

c[: setsize] += beta * (sum - c[: setsize]) 
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end 

end 
end 

def update_f itness(action_set, min_error=10, l_rate=0.2, alpha=0.1, v=-5.0) 

sum = 0.0 

acc = Array .new(action_set . size) 
action_set . each_with_index do |c,i| 

acc[i] = (c [: error] <inin_error) ? 1.0 : alpha* (c[: error] /niin_error)**v 

sum += acc[i] * c[:num] .to_f 
end 

action.set . each. with_ index do I c , i I 

c[:fitness] += l_rate * ((acc[i] * c [:mim] .to_f ) / sum - c[:fitness]) 
end 
end 

def caii_run_genetic_algorithin(action_set , gen, ga_freq) 
return false if action_set . size <= 2 

total = action_set . inject (0.0) {|s,c| s+c[:lasttime]*c[:num]]- 
sum = action_set . inject (0. 0) {|s,c| s+c[:num]]- 
return true if gen - (total/sum) > ga_freq 
return false 
end 

def binary _tournament (pop) 

i, j = rand (pop. size) , rand (pop. size) 
j = rand (pop. size) while j==i 

return (pop [i] [: fitness] > pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def mutation(cl, action_set, input, rate=0.04) 
cl [: condition] . size . times do |i| 
if reindO < rate 

cl [: condition] [i] = (cl [: condition] [i] .chr=='#') ? input[i] : '#' 
end 
end 

if randO < rate 
subset = action_set - [cl [: action] ] 
cl[: action] = subset [rand (subset . size)] 
end 
end 

def unif orm_crossover (parentl, parent2) 

child = "" 

parentl. size. times do |i| 

child « ((rand()<0.5) ? pzirentl [i] . chr : p2u:ent2 [i] . chr) 
end 

return child 
end 

def insert_in_pop(cla, pop) 
pop. each do | c | 

if cla[: condition] ==c [: condition] and cla[: action] ==c[: action] 

c [ : num] += 1 

return 
end 
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end 

pop « cla 
end 

def crossover (cl, c2, pi, p2) 

cl [: condition] = uniform_crossover (pi [: condition] , p2 [: condition] ) 
c2 [: condition] = uniform_crossover (pi [: condition] , p2 [: condition] ) 
c2[:pred] = cl[:pred] = (pi [ : pred] +p2 [ :pred] ) /2. 0 
c2[:error] = cl[:error] = 0 . 25* (pi [: error] +p2 [: error] ) /2 . 0 
c2[:fitness] = cl[:fitness] = 0. 1* (pi [:fitness]+p2[: fitness] )/2.0 
end 

def run_ga(actions , pop, action_set, input, gen, pop_size, crate=0.8) 
pi, p2 = bineiry_tournament (action_set) , binary_tournament (action_set) 
cl, c2 = copy_classif ier (pi) , copy_classif ier (p2) 
crossover (cl, c2, pi, p2) if randO < crate 

[cl , c2] . each do | c | 

mutation(c, actions, input) 

insert_in_pop (c , pop) 
end 

while pop. inject (0) -[|s,c| s+c[:num]}- > pop.size 

delete_f rom_pop(pop, pop_size) 
end 
end 

def train_model (pop_size , inax_gens, actions, ga_freq) 
pop, perf = [] , [] 
max_gens . times do |gen| 

explore = gen. modulo (2) ==0 
input = raiidom_bitstring() 

match.set = generate_match_set (input , pop, actions, gen, pop_size) 

pred_array = generate_prediction(match_set) 
action = select_action(pred_array , explore) 

reward = (tEu:get_function(input)==action.to_i) ? 1000.0 : 0.0 

if explore 

action_set = match_set . select-[ | c | c [: action] ==act ion} 
update_set (action_set , reward) 
updat e_f itness (act ion_set ) 

if can_run_genetic_algorithm(action_set , gen, ga_freq) 
action_set . each {|c| c[:lasttime] = gen} 
run_ga( act ions, pop, action_set, input, gen, pop_size) 

end 
else 

e,a = (pred_ array [action] [: weight] -reward) . abs, ( (reward==1000 . 0) ?1 : 0) 

perf « { : error=>e, : correct=>a} 

if perf. size >= 50 

err = (perf . inject (0) -[ | s ,x | s+x [: error] }/perf . size) . round 

acc = perf . inject (0. 0){ I s,x| s+x [: correct] }/perf . size 

puts " >iter=#-[gen+l} size=#-Cpop.size}, error=#{err} , acc=#{acc}" 

perf = [] 

end 
end 
end 

return pop 
end 
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def test_model (system, num_trials=50) 
correct = 0 
num_trials . times do 

input = random_bitstring() 

match_set = system. select-C | c | does_match? (input , c [: condition] ) ]■ 
pred_array = generate_prediction(match_set) 
action = select_action(pred_array , false) 
correct += 1 if target_funct ion (input) == action. to_i 
end 

puts "Done! classified correctly=#-[correct}-/#{num_trials]-" 
return correct 
end 

def execute (pop_size , max_gens, actions, ga_freq) 

system = train_model(pop_size , max_gens, actions, ga_freq) 
test_model (system) 
return system 

end 

if __FILE__ == $0 

# problem configuration 
all_actions = ['O' , '1'] 

# algorithm configuration 
max_gens, pop_size = 5000, 200 
ga_freq = 25 

# execute the algorithm 

execute (pop_size , max_gens, all_actions, ga_freq) 
end 

Listing 3.8: Learning Classifier System in Ruby 



3.9.6 References 

Primary Sources 

Early ideas on the theory of Learning Classifier Systems were proposed 
by Holland [4, 7], culminating in a standardized presentation a few years 
later [5]. A number of implementations of the theoretical system were 
investigated, although a taxonomy of the two main streams was proposed by 
De Jong [9]: 1) Pittsburgh-style proposed by Smith [11, 12] and 2) Holland- 
style or Michigan-style Learning classifiers that are further comprised of the 
Zeroth-level classifier (ZCS) [13] and the accuracy-based classifier (XCS) 
[14]. 

Learn More 

Booker, Goldberg, and Holland provide a classical introduction to Learning 
Classifier Systems including an overview of the state of the field and the 
algorithm in detail [1]. Wilson and Goldberg also provide an introduction 
and review of the approach, taking a more critical stance [16]. Holmes et al. 
provide a contemporary review of the field focusing both on a description of 
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the method and apphcation areas to which the approach has been demon- 
strated successfully [8]. Lanzi, Stolzmann, and Wilson provide a seminal 
book in the field as a collection of papers covering the basics, advanced 
topics, and demonstration applications; a particular highlight from this book 
is the first section that provides a concise description of Learning Classifier 
Systems by many leaders and major contributors to the field [6], providing 
rare insight. Another paper from Lanzi and Riolo's book provides a detailed 
review of the development of the approach as it matured throughout the 
1990s [10]. Bull and Kovacs provide a second book introductory book to the 
field focusing on the theory of the approach and its practical application [2]. 
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3.10 Non-dominated Sorting Genetic Algorithm 

Non-dominated Sorting Genetic Algorithm, Nondominated Sorting Genetic 
Algorithm, Fast Elitist Non-dominated Sorting Genetic Algorithm, NSGA, 
NSGA-II, NSGAII. 

3.10.1 Taxonomy 

The Non-dominated Sorting Genetic Algorithm is a Multiple Objective Opti- 
mization (MOO) algorithm and is an instance of an Evolutionary Algorithm 
from the field of Evolutionary Computation. Refer to Section 9.5.3 for more 
information and references on Multiple Objective Optimization. NSGA 
is an extension of the Genetic Algorithm for multiple objective function 
optimization (Section 3.2). It is related to other Evolutionary Multiple 
Objective Optimization Algorithms (EMOO) (or Multiple Objective Evolu- 
tionary Algorithms MOEA) such as the Vector- Evaluated Genetic Algorithm 
(VEGA), Strength Pareto Evolutionary Algorithm (SPEA) (Section 3.11), 
and Pareto Archived Evolution Strategy (PAES). There are two versions of 
the algorithm, the classical NSGA and the updated and currently canonical 
form NSGA-II. 

3.10.2 Strategy 

The objective of the NSGA algorithm is to improve the adaptive fit of a 
population of candidate solutions to a Pareto front constrained by a set 
of objective functions. The algorithm uses an evolutionary process with 
surrogates for evolutionary operators including selection, genetic crossover, 
and genetic mutation. The population is sorted into a hierarchy of sub- 
populations based on the ordering of Pareto dominance. Similarity between 
members of each sub-group is evaluated on the Pareto front, and the 
resulting groups and similarity measures are used to promote a diverse front 
of non-dominated solutions. 

3.10.3 Procedure 

Algorithm 3.10.1 provides a pseudocode listing of the Non-dominated Sort- 
ing Genetic Algorithm II (NSGA-II) for minimizing a cost function. The 
Sort ByRankAndDi stance function orders the population into a hierarchy 
of non-dominated Pareto fronts. The CrowdingDistanceAssignment cal- 
culates the average distance between members of each front on the front 
itself. Refer to Deb et al. for a clear presentation of the Pseudocode 
and explanation of these functions [4]. The CrossoverAndMutation func- 
tion performs the classical crossover and mutation genetic operators of 
the Genetic Algorithm. Both the SelectParentsByRankAndDistance and 



3.10. Non-donunated Sorting Genetic Algorithm 



153 



SortByRankAndDi stance functions discriminate members of the popula- 
tion first by rank (order of dominated precedence of the front to which 
the solution belongs) and then distance within the front (calculated by 
CrowdingDistanceAssignment) . 



Algorithm 3.10.1: Pseudocode for NSGAII. 



Input: Population size ProblemSize, Pcroaaover-, Pmutation 

Output: Children 

1 Population <— InitializePopulaLtioniPopulatiorisize) ProblennSize) ; 

2 EvaluateAgainstObjectiveFunct ions (Population) ; 

3 FastNondominatedSort (Population) ; 

4 Selected SelectParentsByRank(Population, Population size)] 

5 Children ^ CrossoverAndMutation(Selected, Pcrossover, Pmutation^ 't 

6 while -iStopConditionO do 
EvaluateAgainstObjectiveFunctions (Children) ; 
Union Merge (Population, Children); 
Fronts -k— FastNondominatedSort (Union); 
Parents 0; 
Front L ^ 0; 

foreach Fronti G Fronts do 

CrowdingDistanceAssignment (.Fronti ) ; 
if Size (Parents) +Size(Fronti) > Populationsize then 
Fronti *; 
BreakO; 
else 

I Parents Merge (Parents, Fronti)', 
end 
end 

if Size (Parents) <Populationsize then 

FrontL ^ SortByRankAndD i stance (Fronti;,) ; 

for Pi to Ppopulationsize-Size(^FrontL ) 

I Parents <— Pi; 
end 
end 

Selected SelectParentsByRankAndDistance (Parents, 

Populationsize^ 5 
Population ^ Children; 



Children CrossoverAndMutat ion (Selected, P, 



crossover ) 



Pmutation ) j 

30 end 

31 return Children; 
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3.10.4 Heuristics 

• NSGA was designed for and is suited to continuous function multiple 
objective optimization problem instances. 

• A binary representation can be used in conjunction with classical 
genetic operators such as one-point crossover and point mutation. 

• A real- valued representation is recommended for continuous function 
optimization problems, in turn requiring representation specific genetic 
operators such as Simulated Binary Crossover (SBX) and polynomial 
mutation [2]. 

3.10.5 Code Listing 

Listing 3.9 provides an example of the Non-dominated Sorting Genetic 
Algorithm II (NSGA-II) implemented in the Ruby Programming Language. 
The demonstration problem is an instance of continuous multiple objective 
function optimization called SCH (problem one in [4]). The problem seeks 
the minimum of two functions: fl = X^^Li /2 = "^^^lixi — 2)^, 

— 10 < Xi < 10 and n = 1. The optimal solution for this function are 
X e [0,2]. The algorithm is an implementation of NSGA-II based on 
the presentation by Deb et al. [4]. The algorithm uses a binary string 
representation (16 bits per objective function parameter) that is decoded 
and rescaled to the function domain. The implementation uses a uniform 
crossover operator and point mutations with a fixed mutation rate of jj, 
where L is the number of bits in a solution's binary string. 

def objectivel (vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x**2.0)} 
end 

def obj ective2 (vector ) 

return vector . inject (0 . 0) {Isum, x| sum + ( (x-2 . 0) **2 . 0) } 
end 

def decode (bitstring , search_space , bits_per_parajn) 
vector = [] 

search_space . each_with_index do (bounds, i| 
off, sum = i*bits_per_parajn, 0.0 

param = bitstring [of f ... (of f +bits_per_param) ]. reverse 
param. size . times do |j| 

sum += ( (param [j ] .chr=='l') ? 1.0 : 0.0) * (2.0 ** j.to_f) 
end 

min, max = bounds 

vector << min + ( (max-min) / ( (2 . 0**bits_per_param. to_f ) -1 . 0) ) * sum 
end 

return vector 
end 

def random_bitstring(num_bits) 
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return (0. . .num.bits) .iiiject("")f |s,i| s«((rand<0.5) ? "1" : "0")} 
end 

def point_iiiutation(bitstring, rate=l . 0/bitstring. size) 
child = "" 

bitstring. size . times do |i| 
bit = bitstring [i] . chr 

child « ((reaid()<rate) ? ((bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def crossover (parentl , parent2, rate) 
return ""+pareiitl if rand()>=rate 
child = "" 

parentl. size. times do |i| 

child « ((rand()<0.5) ? parentl [i] . chr : parent2 [i] . chr) 
end 

return child 
end 

def reproduce (selected, pop_size, p_cross) 
children = [] 

selected. each_with_index do I pi, il 

p2 = (i. modulo (2) ==0) ? selected [i+1] : selected[i-l] 
p2 = selected[0] if i == selected. size-1 
child = {} 

child [: bitstring] = crossover (pi [: bitstring] , p2 [: bitstring] , p_cross) 
child [: bitstring] = point _mutation(child[ : bitstring] ) 
children « child 

break if children. size >= pop_size 
end 

return children 
end 

def calculate_objectives(pop, search.space, bits_per_param) 
pop. each do |p| 

p[: vector] = decode (p[: bitstring] , search.space, bits_per_param) 
p[: objectives] = [objectivel(p[: vector] ) , objective2(p[: vector] )] 
end 
end 

def dominates (pi , p2) 

pi [: objectives] . each_index do |i| 

return false if pi [: objectives] [i] > p2 [: objectives] [i] 
end 

return true 
end 

def f ast_nondominated_sort (pop) 
fronts = Array.new(l)-C[])- 
pop. each do | pi | 

pi [:dom_count] , pl[:dom_set] =0, [] 
pop. each do |p2| 

if dominates (pi, p2) 
pi [:dom_set] « p2 
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elsif dominates(p2, pi) 

pi [ : dom.coiint] += 1 
end 
end 

if pi [:dom_count] == 0 
pi [: rank] = 0 
fronts. first « pi 
end 
end 

curr = 0 
begin 

next_front = [] 
fronts [curr] . each do |pl| 
pi [: doiii_set] . each do |p2| 
p2 [:dom_comit] -= 1 
if p2 [ : dom_count] == 0 
p2[:rank] = (curr+1) 
next_front << p2 
end 
end 
end 

curr += 1 

fronts « next_front if !next_front . empty? 
end while curr < fronts. size 
return fronts 
end 

def calculate_crowding_distance (pop) 
pop. each {Ipl p[:dist] = 0.0} 
num_obs = pop. first [: objectives] . size 
nuin_obs. times do |i| 

min = pop . min-[ I X , y I x [ : obj ectives] [i] <=>y [ : ob j ect Ives] [1] } 

max = pop. max-[ I x,y I x[:objectives] [i]<=>y[:objectives] [i]} 

rge = max [: objectives] [i] - min[: objectives] [i] 

pop. first [:dist] , pop. last [:dist] = 1.0/0.0, 1.0/0.0 

next if rge ==0.0 

(1 ... (pop. size-1) ). each do |j| 

pop[j] [:dist]+=(pop[j+l] [: objectives] [i]-pop[j-l] [: objectives] [i])/rge 
end 
end 
end 

def crowded_comparison_operator (x,y) 

return y[:dist]<=>x[:dist] if x[:rank] == y[:rank] 

return x [ : rank] <=>y [ : rank] 
end 

def better (x,y) 

if !x[:dist] .nil? and x[:rank] == y[:rank] 

return (x[:dist]>y[:dist]) ? x : y 
end 

return (x [: rank] <y [: rank] ) ? x : y 
end 

def select_paxents (fronts, pop_size) 

fronts. each {|f I calculate_crowding_distance(f )} 
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offspring, last_front = [] , 0 
fronts. each do | front | 

break if (of f spring. size+f rent . size) > pop_size 

front. each {|p| offspring << p} 

last_front += 1 
end 

if (remaining = pop_size-off spring. size) > 0 
fronts [last_front] .sort! {|x,y| crowded_comparison_operator (x,y)]■ 
offspring += fronts [last.front] [0. . .remaining] 

end 

return offspring 
end 

def weight ed_ sum (x) 

return x [: objectives] . inject (0. 0) {|sum, x| sum+x} 
end 

def search(search_space, max_gens, pop_size, p_cross, bits_per_param=16) 
pop = Array .new (pop_size) do |i| 

-C :bitstring=>random_bitstring(search_space. size*bits_per_param)} 
end 

calculate_objectives(pop, search_space, bits_per_param) 
f ast_nondominated_sort (pop) 
selected = Array . new(pop_size) do 

better (pop [rand (pop_size)] , pop [rand (pop_size) ] ) 
end 

children = reproduce (selected, pop.size, p_cross) 
calculate_objectives (children, seeu:ch_space, bits_per_p2u:am) 
max.gens . times do |gen| 

union = pop + children 

fronts = fast_nondominated_sort (union) 

parents = select_parents (fronts , pop_size) 

selected = Array. new (pop_size) do 

better (parents [rand (pop.size)] , parents [rand (pop.size)] ) 

end 

pop = children 

children = reproduce (selected, pop_size, p_cross) 
calculate_objectives (children, seEu:ch_space , bits_per_param) 
best = parents . sort K I x,y I weighted_sum(x) <=>weighted_sum(y)} . first 
best_s = " [x=#-[best [: vector]}, objs=#{best [: objectives] . join( ' , ')}]" 
puts " > gen=#{gen+l> , f ronts=#{fronts. size} , best=#{best_s}" 
end 

union = pop + children 
fronts = fast_nondominated_sort (union) 
parents = select .parents (fronts, pop.size) 
return parents 
end 

if __FILE__ == $0 

# problem configuration 

problem_size = 1 

search_space = Array. new(problem_size) {|i| [-10, 10]} 

# algorithm configuration 
max_gens = 50 

pop_size = 100 
p_cross = 0.98 
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# execute the algorithm 

pop = search (search_space, max_gens, pop_size, p_cross) 
puts "done!" 
end 

Listing 3.9: NSGA-II in Ruby 



3.10.6 References 

Primary Sources 

Srinivas and Deb proposed the NSGA inspired by Goldberg's notion of a 
non-dominated sorting procedure [6]. Goldberg proposed a non-dominated 
sorting procedure in his book in considering the biases in the Pareto optimal 
solutions provided by VEGA [5]. Srinivas and Deb's NSGA used the sorting 
procedure as a ranking selection method, and a fitness sharing niching 
method to maintain stable sub-populations across the Pareto front. Deb 
et al. later extended NSGA to address three criticism of the approach: the 
0{mN^) time complexity, the lack of elitism, and the need for a sharing 
parameter for the fitness sharing niching method [3, 4]. 

Learn More 

Deb provides in depth coverage of Evolutionary Multiple Objective Op- 
timization algorithms in his book, including a detailed description of the 
NSGA in Chapter 5 [1]. 
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3.11 Strength Pareto Evolutionary Algorithm 

Strength Pareto Evolutionary Algorithm, SPEA, SPEA2. 

3.11.1 Taxonomy 

Strength Pareto Evolutionary Algorithm is a Multiple Objective Optimiza- 
tion (MOO) algorithm and an Evolutionary Algorithm from the field of 
Evolutionary Computation. It belongs to the field of Evolutionary Multiple 
Objective (EMO) algorithms. Refer to Section 9.5.3 for more information 
and references on Multiple Objective Optimization. Strength Pareto Evo- 
lutionary Algorithm is an extension of the Genetic Algorithm for multiple 
objective optimization problems (Section 3.2). It is related to sibling Evo- 
lutionary Algorithms such as Non-dominated Sorting Genetic Algorithm 
(NSGA) (Section 3.10), Vector- Evaluated Genetic Algorithm (VEGA), and 
Pareto Archived Evolution Strategy (PAES). There are two versions of 
SPEA, the original SPEA algorithm and the extension SPEA2. Additional 
extensions include SPEA+ and iSPEA. 

3.11.2 Strategy 

The objective of the algorithm is to locate and and maintain a front of 
non-dominated solutions, ideally a set of Pareto optimal solutions. This is 
achieved by using an evolutionary process (with surrogate procedures for 
genetic recombination and mutation) to explore the search space, and a 
selection process that uses a combination of the degree to which a candi- 
date solution is dominated (strength) and an estimation of density of the 
Pareto front as an assigned fitness. An archive of the non-dominated set is 
maintained separate from the population of candidate solutions used in the 
evolutionary process, providing a form of elitism. 

3.11.3 Procedure 

Algorithm 3.11.1 provides a pseudocode listing of the Strength Pareto 
Evolutionary Algorithm 2 (SPEA2) for minimizing a cost function. The 
CalculateRawFitness function calculates the raw fitness as the sum of the 
strength values of the solutions that dominate a given candidate, where 
strength is the number of solutions that a give solution dominate. The 
CandidateDensity function estimates the density of an area of the Pareto 
front as ^^^^^ ^^e Euclidean distance of the objective values 

between a given solution the kth. nearest neighbor of the solution, and k is 
the square root of the size of the population and archive combined. The 
PopulateWithRemainingBest function iteratively fills the archive with the 
remaining candidate solutions in order of fitness. The RemoveMostSimilar 
function truncates the archive population removing those members with the 
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smallest cr* values as calculated against the archive. The SelectParents 
function selects parents from a population using a Genetic Algorithm se- 
lection method such as binary tournament selection. The CrossoverAnd- 
Mutation function performs the crossover and mutation genetic operators 
from the Genetic Algorithm. 

Algorithm 3.11.1: Pseudocode for SPEA2. 
Input: Population size 1 ArchivCsize, ProblemSize, Pcrosaover, 

Pmutation 

Output: Archive 

1 Population ^ InitializePopulation(PopwZaiionsi^e> ProblemSize); 

2 Archive 0; 

3 while -iStopConditionO do 

4 for Si G Population do 

5 I Siobjectives CalculateOb j ectives (iSj) ; 

6 end 

7 Union <r- Population + Archive; 

8 for Si e Union do 

9 Siraw ^ CalculateRawFitness(<S'i, Union); 

10 Sidensity <— CalculateSolutionDensity (S'i, Union); 

H fitness ^ "^^raio ~l~ ^^density^ 

12 end 

13 Archive <r- GetNonDominatedC Union); 

14 if Size (Archive) < ArchivEsize then 

15 I PopulateWithRemainingBest (Union, Archive, Archivesize^] 

16 else if Size (Archive) > Archivesize then 

17 I RemoveMostSimilar (Archive, Archiveaize^\ 

18 end 

19 Selected ^ SelectParents (Archive, Population size)] 

20 Population f- CrossoverAndMutat ion (Selected, Pcrossover, 

Pmutation ) > 

21 end 

22 return GetNonDominatedArch'\\/e; 



3.11.4 Heuristics 

• SPEA was designed for and is suited to combinatorial and continuous 
function multiple objective optimization problem instances. 

• A binary representation can be used for continuous function optimiza- 
tion problems in conjunction with classical genetic operators such as 
one-point crossover and point mutation. 
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• A k value of 1 may be used for efficiency whilst still providing useful 
results. 

• The size of the archive is commonly smaller than the size of the 
population. 

• There is a lot of room for implementation optimization in density and 
Pareto dominance calculations. 

3.11.5 Code Listing 

Listing 3.10 provides an example of the Strength Pareto Evolutionary 
Algorithm 2 (SPEA2) implemented in the Ruby Programming Language. 
The demonstration problem is an instance of continuous multiple objective 
function optimization called SCH (problem one in [1]). The problem seeks 
the minimum of two functions: fl = Yl^^i ^1 = Yl^^ii^i ~ 2)'^, 

— 10 < Xi < 10 and n = 1. The optimal solutions for this function are 
X G [0,2]. The algorithm is an implementation of SPEA2 based on the 
presentation by Zitzler, Laumanns, and Thiele [5]. The algorithm uses a 
binary string representation (16 bits per objective function parameter) that 
is decoded and rescaled to the function domain. The implementation uses a 
uniform crossover operator and point mutations with a fixed mutation rate 
of where L is the number of bits in a solution's binary string. 

def objectivel (vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x**2.0)} 
end 

def objective2(vector) 

return vector . inject (0 . 0) -[|sum, x| sum + ( (x-2 . 0) **2 . 0) ]■ 
end 

def decode (bitstring, search_space , bits_per_parcim) 
vector = [] 

search_space . each_with_index do | bounds, i| 
off, sum = i*bits_per_param, 0.0 

param = bitstring [of f ... (of f +bits_per_param) ]. reverse 
parajn. size . times do |j| 

sum += ( (param [ j] .chr=='l') ? 1.0 : 0.0) * (2.0 ** j.to_f) 
end 

min, max = bounds 

vector << min + ( (max-min) / ( (2 . 0**bits_per_param. to_f ) -1 . 0) ) * sum 
end 

return vector 
end 

def point_mutation(bitstring, rate=l . 0/bitstring. size) 
child = "" 

bitstring. size . times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<rate) ? ((bit=='l') ? "0" : "1") : bit) 
end 



29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 
65 
66 
67 
68 
69 
70 
71 
72 
73 
74 
75 
76 
77 
78 
79 
80 
81 
82 
83 
84 



3.11. Strength Paxeto Evolutionary Algorithm 



163 



return child 
end 

def binary .tournament (pop) 

i, j = reuid (pop. size) , rand (pop. size) 

j = randCpop . size) while j==i 

return (pop [i] [: fitness] < pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def crossover (parentl , parent2, rate) 
return ""+parentl if rand ()>=r ate 
child = "" 

parentl . size . times do |i| 

child « ((rand()<0.5) ? parentl [i] . chr : parent2 [i] . chr) 
end 

return child 
end 

def reproduce (selected, pop_size, p_cross) 
children = [] 

selected. each_with_index do I pi, i| 

p2 = (i. modulo (2) ==0) ? selected[i+l] : selected[i-l] 
p2 = selected [0] if i == selected. size-1 

child = {> 

child [: bitstring] = crossover (pi [: bitstring] , p2 [ : bitstring] , p_cross) 
child [:bitstring] = point _mutation(child[ : bitstring] ) 
children « child 

break if children. size >= pop_size 
end 

return children 
end 

def random_bitstring(num_bits) 

return (0. . .num.bits) .inject(""){ls,i| s«((rand<0.5) ? "1" : "0")} 
end 

def calculate_objectives (pop, search_space , bits_per_param) 
pop . each do I p I 

p[: vector] = decode (p[: bitstring] , search.space, bits_per_param) 

p [ : obj ectives] = [] 

p[: objectives] << objectivel(p[: vector] ) 
p[: objectives] « obj ective2(p[: vector] ) 
end 
end 

def dominates? (pi , p2) 

pi [: obj ectives] . each_index do |i| 

return false if pl[:objectives] [i] > p2 [: obj ectives] [i] 
end 

return true 
end 

def weighted_sum(x) 

return x[: objectives] .inject (0.0) -CI sum, x| sum+x} 
end 
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def euclidean_distance(cl , c2) 

sum = 0.0 

cl. each. index {|i| sum += (cl [i] -c2 [i] ) **2 . 0} 
return Hath. sqrt (sum) 
end 

def calculate_dominated(pop) 
pop . each do I pi I 

pl[:dom_set] = pop. select -Clp2| pl!=p2 and dominates? (pi, p2) } 
end 
end 

def calculate_raw_f itness (pi , pop) 

return pop. inject (0 . 0) do I sum, p2| 

(dominates? (p2 , pi)) ? sum + p2 [ : dom_set] . size. to_f : sum 

end 
end 

def calculate_density (pi , pop) 
pop . each do I p2 I 

p2[:dist] = euclidean_distance(pl[:objectives] , p2[:objectives] ) 
end 

list = pop. sort{ I x,y I x[:dist] <=>y [:dist]} 

k = Math. sqrt (pop . size) . to_i 
return 1.0 / (list [k] [:dist] +2.0) 
end 

def calculate_f itness (pop , archive, search_space , bits_per_par2un) 
calculate_objectives (pop, search_space, bits_per_param) 
union = archive + pop 
calculate_dominat ed (union) 
union. each do |p| 

p[:raw_f itness] = calculate_raw_f itness (p, union) 

pC: density] = calculate.density (p, union) 

p[:fitness] = p[:raw_f itness] + p[:density] 
end 
end 

def environmental_selection(pop, archive, archive.size) 
union = archive + pop 

environment = union. select {|pl p [: fitness] <1 . 0} 
if environment . size < archive.size 

union, sort ! -[ | x,y | x[: fitness] <=>y [: fitness] } 
union. each do |p| 

environment « p if p[: fitness] >= 1.0 
break if environment. size >= eurchive.size 
end 

elsif environment . size > archive_size 
begin 

k = Math. sqrt (environment . size) .to_i 
environment . each do |pl| 

environment . each do |p2| 
p2[:dist] = euclidean_distance (pi [: objectives] , p2 [: objectives] ) 

end 

list = environment . sort-C I x,y I x[:dist]<=>y[:dist]} 
pl[:density] - list [k] [:dist] 
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end 

environment . sort ! { | x,y | x [: density] <=>y [ : density] } 
environment . shift 
end until environment . size <= archive_size 
end 

return environment 
end 

def search(search_space , max_gens, pop_size, archive_size , p_cross, 
bits_per_param=16) 
pop = Array . new(pop_size) do |i| 

-[ : bitstring=>random_bitstring(search_space . size*bits_per_param) } 
end 

gen, archive =0, [] 
begin 

calculate_f itness (pop , archive, search_space , bits_per_param) 
archive = environmental_selection(pop, archive, archive_size) 
best = archive . sort-[ I X , y I weighted_sum(x) <=>weighted_sum (y) }■ . f irst 
puts ">gen=#-[gen]- , objs=#-Cbest [: objectives] . join( ' , ')}■" 
break if gen >= max_gens 

selected = Array . new(pop_size) -[binary_tournament (archive) } 

pop = reproduce (selected, pop_size, p_cross) 

gen += 1 
end while true 
return archive 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 1 

search_space = Array . new(problem_size) {|i| [-10, 10]} 

# algorithm configuration 
max_gens = 50 

pop_size = 80 
archive_size = 40 
p_cross = 0.90 

# execute the algorithm 

pop = search(search_space , max_gens, pop_size, archive_size , p_cross) 
puts "done!" 
end 

Listing 3.10: SPEA2 in Ruby 



3.11.6 References 

Primary Sources 

Zitzler and Thiele introduced the Strength Pareto Evolutionary Algorithm 
as a technical report on a multiple objective optimization algorithm with 
elitism and clustering along the Pareto front [6]. The technical report 
was later published [7]. The Strength Pareto Evolutionary Algorithm was 
developed as a part of Zitzler's PhD thesis [2]. Zitzler, Laumanns, and 
Thiele later extended SPEA to address some inefficiencies of the approach, 
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the algorithm was called SPEA2 and was released as a technical report [4] 
and later published [5]. SPEA2 provides fine-grained fitness assignment, 
density estimation of the Pareto front, and an archive truncation operator. 

Learn More 

Zitzler, Laumanns, and Bleuler provide a tutorial on SPEA2 as a book 
chapter that considers the basics of multiple objective optimization, and the 
differences from SPEA and the other related Multiple Objective Evolutionary 
Algorithms [3]. 
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4.1 Overview 

This chapter describes Physical Algorithms. 

4.1.1 Physical Properties 

Physical algorithms are those algorithms inspired by a physical process. The 
described physical algorithm generally belong to the fields of Metaheustics 
and Computational Intelligence, although do not fit neatly into the existing 
categories of the biological inspired techniques (such as Swarm, Immune, 
Neural, and Evolution). In this vein, they could just as easily be referred to 
as nature inspired algorithms. 

The inspiring physical systems range from metallurgy, music, the inter- 
play between culture and evolution, and complex dynamic systems such as 
avalanches. They are generally stochastic optimization algorithms with a 
mixtures of local (neighborhood-based) and global search techniques. 

4.1.2 Extensions 

There are many other algorithms and classes of algorithm that were not 
described inspired by natural systems, not limited to: 

• More Annealing: Extensions to the classical Simulated Annealing 
algorithm, such as Adaptive Simulated Annealing (formally Very Fast 
Simulated Re-annealing) [3, 4], and Quantum Annealing [1, 2]. 

• Stochastic tunnehng: based on the physical idea of a particle 
tunneling through structures [5]. 
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4.2 Simulated Annealing 

Simulated Annealing, SA. 

4.2.1 Taxonomy 

Simulated Annealing is a global optimization algorithm that belongs to the 
field of Stochastic Optimization and Metaheuristics. Simulated Annealing 
is an adaptation of the Metropolis-Hastings Monte Carlo algorithm and is 
used in function optimization. Like the Genetic Algorithm (Section 3.2), it 
provides a basis for a large variety of extensions and specialization's of the 
general method not limited to Parallel Simulated Annealing, Fast Simulated 
Annealing, and Adaptive Simulated Annealing. 

4.2.2 Inspiration 

Simulated Annealing is inspired by the process of annealing in metallurgy. In 
this natural process a material is heated and slowly cooled under controlled 
conditions to increase the size of the crystals in the material and reduce their 
defects. This has the effect of improving the strength and durability of the 
material. The heat increases the energy of the atoms allowing them to move 
freely, and the slow cooling schedule allows a new low-energy configuration 
to be discovered and exploited. 

4.2.3 Metaphor 

Each configuration of a solution in the search space represents a different 
internal energy of the system. Heating the system results in a relaxation of 
the acceptance criteria of the samples taken from the search space. As the 
system is cooled, the acceptance criteria of samples is narrowed to focus on 
improving movements. Once the system has cooled, the configuration will 
represent a sample at or close to a global optimum. 

4.2.4 Strategy 

The information processing objective of the technique is to locate the 
minimum cost configuration in the search space. The algorithms plan 
of action is to probabilistically re-sample the problem space where the 
acceptance of new samples into the currently held sample is managed by a 
probabilistic function that becomes more discerning of the cost of samples it 
accepts over the execution time of the algorithm. This probabilistic decision 
is based on the Metropolis-Hastings algorithm for simulating samples from 
a thermodynamic system. 
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4.2.5 Procedure 

Algorithm 4.2.1 provides a pseudocode listing of the main Simulated An- 
nealing algorithm for minimizing a cost function. 

Algorithm 4.2.1: Pseudocode for Simulated Annealing. 
Input: ProblemSize, iterations max, te.mpmax 

Output: Shest 

1 Scurrent ^ CreateliiitialSolution(ProblemSize) ; 
3 for i = 1 to iterationsynax do 



4 
5 
6 
7 
8 
9 
10 

11 
12 
13 



Si ^ CreateNeighborSolution(S'current) ; 
tempcurr ^ CalculateTemperature (i, tempmax^] 

) then 

^current ^ii 

if Castes'^) < Cost(5'best) then 

I Sltest Si] 

end 

else if Exp( cost(g„ -e.t )-cost(g, ) ^ ^ RandO then 
end 



14 end 

15 return ShesU 



4.2.6 Heuristics 

• Simulated Annealing was designed for use with combinatorial optimiza- 
tion problems, although it has been adapted for continuous function 
optimization problems. 

• The convergence proof suggests that with a long enough cooling period, 
the system will always converge to the global optimum. The downside 
of this theoretical finding is that the number of samples taken for 
optimum convergence to occur on some problems may be more than 
a complete enumeration of the search space. 

• Performance improvements can be given with the selection of a can- 
didate move generation scheme (neighborhood) that is less likely to 
generate candidates of significantly higher cost. 

• Restarting the cooling schedule using the best found solution so far 
can lead to an improved outcome on some problems. 

• A common acceptance method is to always accept improving solu- 
tions and accept worse solutions with a probability of P{accept) ^ 
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exp(^^^), where T is the current temperature, e is the energy (or cost) 
of the current solution and e' is the energy of a candidate solution 
being considered. 

• The size of the neighborhood considered in generating candidate 
solutions may also change over time or be influenced by the tempera- 
ture, starting initially broad and narrowing with the execution of the 
algorithm. 

• A problem specific heuristic method can be used to provide the starting 
point for the search. 

4.2.7 Code Listing 

Listing 4.1 provides an example of the Simulated Annealing algorithm 
implemented in the Ruby Programming Language. The algorithm is applied 
to the Berlin52 instance of the Traveling Salesman Problem (TSP), taken 
from the TSPLIB. The problem seeks a permutation of the order to visit 
cities (called a tour) that minimizes the total distance traveled. The optimal 
tour distance for Berlin52 instance is 7542 units. 

The algorithm implementation uses a two-opt procedure for the neigh- 
borhood function and the classical P{accept) exp(^^^) as the acceptance 
function. A simple linear cooling regime is used with a large initial temper- 
ature which is decreased each iteration. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (permutation, cities) 
distance =0 

permutation . each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

distance += euc_2d(cities [cl] , cities [c2]) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array . new(cities . size) -[ | i I i} 
perm. each_index do |i| 

r = randCperm. size-i) + i 

perm[r] , perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def stochastic_two_opt ! (perm) 

cl, c2 = rand (perm . size) , rand(perm. size) 
exclude = [cl] 

exclude << ((cl==0) ? perm. size-1 : cl-1) 
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exclude « ( (cl==perm. size-1) ? 0 : cl+1) 
c2 = rand (perm. size) while exclude. include? (c2) 
cl, c2 = c2, cl if c2 < cl 
permEcl . . . c2] = perm [c 1. . .c2] .reverse 
return perm 
end 

def cr eat e_neighbor (current, cities) 
candidate = {} 

candidate [: vector] = Array. new (current [: vector]) 
stochastic_two_opt ! (candidate [ : vector] ) 
candidate [: cost] = cost (candidate [: vector] , cities) 
return candidate 
end 

def should_accept? (candidate, current, temp) 

return true if candidate [: cost] <= current [: cost] 

return Math. exp( (current [: cost] - candidate [: cost] ) / temp) > randO 
end 

def search(cities , max_iter, max_temp, temp_change) 
current = { : vector=>random_permutation(cities) } 
current [: cost] = cost (current [: vector] , cities) 
temp, best = max_temp, current 
max_iter . times do | iter | 

candidate = create_neighbor (current , cities) 
temp = temp * temp_ change 

current = candidate if should_accept? (candidate , current, temp) 
best = candidate if candidate [: cost] < best [: cost] 
if (iter+1) .modulo(lO) == 0 

puts " > iteration #{ (iter+1)}, temp=#{temp)-, best=#{best [: cost] }" 
end 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [[565,575] , [25, 185] , [345,750] , [945,685] , [845,655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iterations = 2000 
max.temp = 100000.0 
temp_ change = 0.98 

# execute the algorithm 

best = search (berlin52, max.iterations, maz_temp, temp.change) 
puts "Done. Best Solution: c=#{best [: cost] }, v=#{best [: vector] . inspect}" 
end 
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Listing 4.1: Simulated Annealing in Ruby 



4.2.8 References 
Primary Sources 

Simulated Annealing is credited to Kirkpatrick, Gelatt, and Vecchi in 1983 
[5]. Granville, Krivanek, and Rasson provided the proof for convergence 
for Simulated Annealing in 1994 [2]. There were a number of early studies 
and application papers such as Kirkpatrick's investigation into the TSP 
and minimum cut problems [4], and a study by Vecchi and Kirkpatrick on 
Simulated Annealing applied to the global wiring problem [7]. 

Learn More 

There are many excellent reviews of Simulated Annealing, not limited to 
the review by higher that describes improved methods such as Adaptive 
Simulated Annealing, Simulated Quenching, and hybrid methods [3]. There 
are books dedicated to Simulated Annealing, applications and variations. 
Two examples of good texts include "Simulated Annealing: Theory and 
Applications" by Laarhoven and Aarts [6] that provides an introduction 
to the technique and applications, and "Simulated Annealing: Paralleliza- 
tion Techniques" by Robert Azencott [1] that focuses on the theory and 
applications of parallel methods for Simulated Annealing. 
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4.3 Extremal Optimization 

Extremal Optimization, EO. 

4.3.1 Taxonomy 

Extremal Optimization is a stochastic search technique that has the prop- 
erties of being a local and global search method. It is generally related 
to hill-climbing algorithms and provides the basis for extensions such as 
Generalized Extremal Optimization. 

4.3.2 Inspiration 

Extremal Optimization is inspired by the Bak-Sneppen self-organized crit- 
icality model of co-evolution from the field of statistical physics. The 
self-organized criticality model suggests that some dynamical systems have 
a critical point as an attractor, whereby the systems exhibit periods of 
slow movement or accumulation followed by short periods of avalanche or 
instability. Examples of such systems include land formation, earthquakes, 
and the dynamics of sand piles. The Bak-Sneppen model considers these 
dynamics in co- evolutionary systems and in the punctuated equilibrium 
model, which is described as long periods of status followed by short periods 
of extinction and large evolutionary change. 

4.3.3 Metaphor 

The dynamics of the system result in the steady improvement of a candidate 
solution with sudden and large crashes in the quality of the candidate 
solution. These dynamics allow two main phases of activity in the system: 
1) to exploit higher quality solutions in a local search like manner, and 2) 
escape possible local optima with a population crash and explore the search 
space for a new area of high quality solutions. 

4.3.4 Strategy 

The objective of the information processing strategy is to iteratively identify 
the worst performing components of a given solution and replace or swap 
them with other components. This is achieved through the allocation of cost 
to the components of the solution based on their contribution to the overall 
cost of the solution in the problem domain. Once components are assessed 
they can be ranked and the weaker components replaced or switched with a 
randomly selected component. 
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4.3.5 Procedure 

Algorithm 4.3.1 provides a pseudocode listing of the Extremal Optimization 
algorithm for minimizing a cost function. The deterministic selection of the 
worst component in the SelectWeakComponent function and replacement 
in the SelectReplacementComponent function is classical EO. If these 
decisions are probabilistic making use of r parameter, this is referred to as 
r- Extremal Optimization. 



Algorithm 4.3.1: Pseudocode for Extremal Optimization. 
Input: ProblemSize, iterations^axi t 

Output: Sbest 

1 Scurrent ^ CreatelnitialSolution(ProblemSize) ; 
3 for i = 1 to iterationSmax do 



4 
5 
6 
7 
8 



foreach Componenti G Scurrent do 

I Componentl°^^ ^ Cost, {Componenti, Scurrent^] 
end 

RankedComponents ^ Kssi^iSi components^ 

Componenti ^ SelectWeakComponent (RankedComponents, 

Componenti, t); 
Component j 

SelectReplacementComponent (RankedComponents, r) ; 

Scandidate ^ Replsice (Scurrent , Componenti, Component j); 

if Cost (S candidate^ < Cost (Sbest^ then 
I SljQgf i Scandidate-i 

end 



10 

11 

12 
13 

14 end 

15 return Shest] 



4.3.6 Heuristics 

• Extremal Optimization was designed for combinatorial optimization 
problems, although variations have been applied to continuous function 
optimization. 

• The selection of the worst component and the replacement component 
each iteration can be deterministic or probabilistic, the latter of 
which is referred to as r- Extremal Optimization given the use of a r 
parameter. 

• The selection of an appropriate scoring function of the components of 
a solution is the most difficult part in the application of the technique. 
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• For r- Extremal Optimization, low r values are used (such as r G 
[1.2, 1.6]) have been found to be effective for the TSP. 

4.3.7 Code Listing 

Listing 4.2 provides an example of the Extremal Optimization algorithm 
implemented in the Ruby Programming Language. The algorithm is applied 
to the Berlin52 instance of the Traveling Salesman Problem (TSP), taken 
from the TSPLIB. The problem seeks a permutation of the order to visit 
cities (called a tour) that minimizes the total distance traveled. The optimal 
tour distance for Berlin52 instance is 7542 units. 

The algorithm implementation is based on the seminal work by Boettcher 
and Percus [5] . A solution is comprised of a permutation of city components. 
Each city can potentially form a connection to any other city, and the 
connections to other cities ordered by distance may be considered its neigh- 
borhood. For a given candidate solution, the city components of a solution 
are scored based on the neighborhood rank of the cities to which they are 

Q 

connected: fitnessk ^ F+f"? where ri and Vj are the neighborhood ranks 
of cities i and j against city k. A city is selected for modification probabilis- 
tically where the probability of selecting a given city is proportional to n^^, 
where n is the rank of city i. The longest connection is broken, and the 
city is connected with another neighboring city that is also probabilistically 
selected. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (permutation, cities) 
distance =0 

permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

distance += euc_2d(cities [cl] , cities [c2]) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array. new(cities. size){ | i | i]- 
perm . each_index do |i| 

r = randCperm. size-i) + i 

perm[r], perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def calculate_neighbor_rank(city_number , cities, ignore=[]) 
neighbors = [] 

cities . each_with_index do Icity, i| 

next if i==city_number or ignore . include? (i) 
neighbor = -[ : number=>i]- 
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neighbor [: distance] = euc_2d(cities [city_number] , city) 
neighbors « neighbor 
end 

return neighbors.sort !{ |x,y I x[: distance] <=> y[: distance] } 
end 

def get_edges_f or_city (city_nuinber , permutation) 
cl, c2 = nil, nil 

permutation. each_with_index do |c, i| 

if c == city_number 

cl = (i==0) ? permutation. last : permutation [i-1] 

c2 = (i==permutation. size-1) ? permutation. first : permutation [i+1] 
break 
end 
end 

return [cl, c2] 
end 

def calculate_city_f itness(permutation, city_number, cities) 
cl, c2 = get_edges_f or_city(city_number , permutation) 
neighbors = calculate_neighbor_rank(city_number , cities) 
nl, n2 = -1, -1 

neighbors . each_with_ index do I neighbor, i I 

nl = i+1 if neighbor [: number] == cl 
n2 = i+1 if neighbor [: number] == c2 
breeik if nl!=-l and n2!=-l 
end 

return 3.0 / (nl.to_f + n2.to_f) 
end 

def calculate_city_fitnesses (cities, permutation) 

city_f itnesses = [] 
cities . each_with_index do Icity, i| 
city_fitness = {:number=>i)- 

city_fitness [: fitness] = calculate_city_f itness(permutation, i, cities) 
city_f itnesses « city_fitness 
end 

return city_f itnesses. sort K I x,y I y[: fitness] <=> x[: fitness]} 
end 

def calculate_component_probabilities(ordered_components, tau) 
sum = 0.0 

ordered_components . each_with_index do | component, i| 

component [:prob] = (i+1 . 0) ** (-tau) 

sum += component [:prob] 
end 

return sum 
end 

def make_selection(components, sum_probability) 

selection = randO 

components . each_with_index do | component, i| 

selection -= (component [:prob] / sum_probability) 

return component [: number] if selection <= 0.0 
end 

return components . last [ : number] 
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end 

def probabilistic_selection(ordered._components, tau, exclude=[]) 
sum = calculate_componeiit_probabilities(ordered_components, tau) 
selected_city = nil 
begin 

selected_city = make_selection(ordered_components, sum) 
end vhile exclude. include? (selected.city) 
return selected_city 
end 

def vary_permutation(permutation, selected, new, long.edge) 
perm = Array .new (permutation) 

cl, c2 = perm.rindex(selected) , perm. r index (new) 
pl,p2 = (cl<c2) ? [cl,c2] : [c2,cl] 
right = (cl==perm. size-1) ? 0 : cl+1 
if perm [right] == long_edge 

perm[pl+l . .p2] = perm [pl+1. .p2] .reverse 
else 

perm[pl . . .p2] = perm [pi. . .p2] .reverse 
end 

return perm 
end 

def get_long_edge (edges, neighbor_distances) 

nl = neighbor .distances . find -C|x| x [: number] ==edges [0] } 
n2 = neighbor .distances . find -C|x| xC:number]==edges [1]} 
return (nl [: distance] > n2 [: distance] ) ? nl[: number] : n2[:nvunber] 

end 

def create_new_perm(cities, tau, perm) 

city_f itnesses = calculate_city_fitnesses (cities , perm) 
selected_city = probabilistic_selection(city_f itnesses. reverse, tau) 
edges = get_edges_f or_city(selected_city, perm) 
neighbors = calculate_neighbor_rank(selected_city , cities) 
new_neighbor = probabilistic_selection(neighbors, tau, edges) 
long_edge = get_long_edge (edges, neighbors) 

return vaxy_permutation(perm, selected.city , new_neighbor , long.edge) 
end 

def search(cities, msix.iterations, tau) 

current = {: vector=>random_permutation(cities)]■ 
current [ : cost] = cost (current [: vector] , cities) 
best = current 

max_ iter at i ons . times do I iter I 

candidate = {} 

candidate [: vector] = create_new_perm(cities , tau, current [: vector] ) 
Ccindidate [: cost] = cost (candidate [: vector] , cities) 
current = candidate 

best = candidate if candidate [: cost] < best [: cost] 

puts " > iter #{(iter+l))-, curr=#{ current [: cost] } , best=#-[best [: cost] }" 
end 

return best 
end 

if __FILE__ == $0 
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140 # problem configuration 

141 berlin52 = [ [565 , 575] , [25 , 185] , [345 , 750] , [945 , 685] , [845 , 655] , 

142 [880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 

143 [1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 

144 [415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 

145 [835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 

146 [410,250] , [420,555] , [575,665] , [1150, 1160] , [700,580] , [685,595] , 

147 [685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 

148 [95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 

149 [830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

150 # algorithm configuration 

151 max_iterations = 250 

152 tau = 1.8 

153 # execute the algorithm 

154 best = search(berlin52 , max_iterations , tau) 

155 puts "Done. Best Solution: c=#{best [ : cost] }• , v=#{best [: vector] . inspect} ' 

156 end 

Listing 4.2: Extremal Optimization in Ruby 



4.3.8 References 

Primary Sources 

Extremal Optimization was proposed as an optimization heuristic by Boettcher 
and Percus applied to graph partitioning and the Traveling Salesman Prob- 
lem [5]. The approach was inspired by the Bak-Sneppen self-organized 
criticality model of co-evolution [1, 2]. 

Learn More 

A number of detailed reviews of Extremal Optimization have been presented, 
including a review and studies by Boettcher and Percus [4], an accessible 
review by Boettcher [3], and a focused study on the Spin Glass problem by 
Boettcher and Percus [6]. 
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4.4 Harmony Search 

Harmony Search^ HS. 

4.4.1 Taxonomy 

Harmony Search belongs to the fields of Computational Intelligence and 
Metahemistics. 

4.4.2 Inspiration 

Harmony Search was inspired by the improvisation of Jazz musicians. Specif- 
ically, the process by which the musicians (who may have never played 
together before) rapidly refine their individual improvisation through varia- 
tion resulting in an aesthetic harmony. 

4.4.3 Metaphor 

Each musician corresponds to an attribute in a candidate solution from a 
problem domain, and each instrument's pitch and range corresponds to the 
bounds and constraints on the decision variable. The harmony between the 
musicians is taken as a complete candidate solution at a given time, and 
the audiences aesthetic appreciation of the harmony represent the problem 
specific cost function. The musicians seek harmony over time through small 
variations and improvisations, which results in an improvement against the 
cost function. 

4.4.4 Strategy 

The information processing objective of the technique is to use good candi- 
date solutions already discovered to influence the creation of new candidate 
solutions toward locating the problems optima. This is achieved by stochas- 
tically creating candidate solutions in a step- wise manner, where each 
component is either drawn randomly from a memory of high-quality so- 
lutions, adjusted from the memory of high-quality solutions, or assigned 
randomly within the bounds of the problem. The memory of candidate 
solutions is initially random, and a greedy acceptance criteria is used to 
admit new candidate solutions only if they have an improved objective value, 
replacing an existing member. 

4.4.5 Procedure 

Algorithm 4.4.1 provides a pseudocode listing of the Harmony Search algo- 
rithm for minimizing a cost function. The adjustment of a pitch selected 
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from the harmony memory is typically linear, for example for continuous 
function optimization: 

x' -(^ X -\- range x e (4-1) 

where range is a the user parameter (pitch bandwidth) to control the size 
of the changes, and e is a uniformly random nmnber G [—1, 1]. 



Algorithm 4.4.1: Pseudocode for Harmony Search. 

Input: Pitchnum, Pitchbounds, Memory size, Consolidationrate, 

Pitch Adjustr ate j Improvisationmax 
Output: Harmonyiest 

1 Harmonies InitializeHannonyMemory(P«ic/inumj PUchbounds: 
Memorysize^] 

2 EvaluateHarmonies (Harmonies); 

3 for i to Improvisationmax do 
Harmony ^ 0; 

foreach Pitchi G Pitchnum do 

if RandO < C onsolidatioUrate then 
RandomH arrnonyp^^^^ •(— 

SelectRandomHarmonyPitchCHarmonies, Pitchi); 
if RandO < PitchAdjustrate then 
Harmony'pi^^f^ ^ 

kdjustPxtchCRandomHarmonypj^^^f^ ) ; 
else 

I Harmony'pi^^^ ^ RandomHarmony'p^^^^^, 
end 
else 

I Harmony^^^f^ <r- RandomP it ch ( Pitchbounds ) ; 
end 
end 

EvaluateHaraonies (Harmony) ; 

if Cost (Harmony} < Cost (Worst (Harmonies)) then 
I Worst (Harmonies) <— Harmony; 
end 



10 

11 

12 
13 
14 
15 
16 
17 
18 
19 



20 

21 end 

22 return Harmonybest'i 



4:A.Q Heuristics 

• Harmony Search was designed as a generalized optimization method 
for continuous, discrete, and constrained optimization and has been 
applied to numerous types of optimization problems. 
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• The harmony memory considering rate (HMCR) E [0, 1] controls the 
use of information from the harmony memory or the generation of 
a random pitch. As such, it controls the rate of convergence of the 
algorithm and is typically configured G [0.7,0.95]. 

• The pitch adjustment rate (PAR) G [0, 1] controls the frequency of 
adjustment of pitches selected from harmony memory, typically config- 
ured G [0.1, 0.5]. High values can result in the premature convergence 
of the search. 

• The pitch adjustment rate and the adjustment method (amount of 
adjustment or fret width) are typically fixed, having a linear effect 
through time. Non-linear methods have been considered, for example 
refer to Geem [4]. 

• When creating a new harmony, aggregations of pitches can be taken 
from across musicians in the harmony memory. 

• The harmony memory update is typically a greedy process, although 
other considerations such as diversity may be used where the most 
similar harmony is replaced. 

4.4.7 Code Listing 

Listing 4.3 provides an example of the Harmony Search algorithm imple- 
mented in the Ruby Programming Language. The demonstration problem is 
an instance of a continuous function optimization that seeks minf{x) where 
/ = Yl7=i ^li Xi < 5.0 and n = 3. The optimal solution for this 

basin function is {vq, . . . ,Vn-i) = 0.0. The algorithm implementation and 
parameterization are based on the description by Yang [7], with refinement 
from Geem [4]. 

def objective_function(vector) 

return vector . inject (0 . 0) -[|suin, x| sum + (x ** 2.0)} 
end 

def rand_in_bounds (min, max) 

return min + ( (max-min) * randO) 
end 

def random_vector (search_space) 

return Array . new(search_space . size) do |i| 

rand_in_bounds (search_space [i] [0] , search_space [i] [1]) 

end 
end 

def create_random_harmony (search_space) 
harmony = {]■ 

harmony [: vector] = random_vector (search_space) 

harmony [: fitness] = objective_function(harmony[: vector] ) 
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return harmony 
end 

def initialize_harmony_memory(search_space, mein_size, factor=3) 

memory = Array . new (mem_size*f actor )-[create_random_harmony(search_space)]- 
memory . sort ! -[ | x,y | x [: fitness] <=>y [: fitness] } 
return memory . first (mem_size) 

end 

def create_harmony (search_space , memory, consid_rate, adjust_rate, range) 
vector = Array . new(search_space . size) 
sear ch.space. size. times do |i| 
if randO < consid_rate 
value = memory [rand (memory. size)] [: vector] [i] 

value = value + range*rand_in_bounds(-l . 0, 1.0) if randO <adjust_rate 
value = search_space [i] [0] if value < seeurch.space [i] [0] 
value = search_space [i] [1] if value > search_space[i] [1] 
vector [i] = value 
else 

vector [i] = rEuid_in_bounds(sesu:ch_space[i] [0] , search_space [i] [1] ) 
end 
end 

return {:vector=>vector> 
end 

def search (bounds , max_iter, mem_size, consid_rate, adjust _rate, range) 
memory = initialize_harmony .memory (bounds, mem.size) 
best = memory. first 
max_iter . times do I iter I 

harm = create_harmony (bounds , memory, consid_rate, adjust_rate, range) 

harm[: fitness] = obj ective_f unction (harm [: vector] ) 

best = harm if harm [ : f itness] < best [: fitness] 

memory << heirm 

memory. sort K |x,y| x[:f itness] <=>y[:f itness]} 

memory . delete_at (memory . size-1) 

puts " > iteration=#-Citer}, fitness=# {best [: fitness]}" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 3 

bounds = Array. new (problem_size) {|i| [-5, 5]} 

# algorithm configuration 
mem_size = 20 
consid_rate = 0.95 
adjust_rate = 0.7 

range = 0 . 05 
max_iter = 500 

# execute the algorithm 

best = search (bounds , max_iter, mem_size, consid_rate, adjust_rate, range) 
puts "done! Solution: f=#{best [: fitness]}, s=#{best [: vector] . inspect}" 
end 
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4.4.8 References 

Primary Sources 

Geem et al. proposed the Harmony Search algorithm in 2001, which was 
apphed to a range of optimization problems including a constraint optimiza- 
tion, the Traveling Salesman problem, and the design of a water supply 
network [6]. 

Learn More 

A book on Harmony Search, edited by Geem provides a collection of papers 
on the technique and its applications [2] , chapter 1 provides a useful summary 
of the method heuristics for its configuration [7]. Similarly a second edited 
volume by Geem focuses on studies that provide more advanced applications 
of the approach [5] , and chapter 1 provides a detailed walkthrough of the 
technique itself [4]. Geem also provides a treatment of Harmony Search 
applied to the optimal design of water distribution networks [3] and edits 
yet a third volume on papers related to the application of the technique to 
structural design optimization problems [1]. 

4.4.9 Bibliography 

[1] Z. W. Geem, editor. Harmony Search Algorithms for Structural Design 
Optimization. Springer, 2009. 
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Applications, chapter Harmony Search as a Metaheuristic, pages 1-14. 
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4.5 Cultural Algorithm 

Cultural Algorithm, CA. 

4.5.1 Taxonomy 

The Cultural Algorithm is an extension to the field of Evolutionary Computa- 
tion and may be considered a Meta- Evolutionary Algorithm. It more broadly 
belongs to the field of Computational Intelligence and Metaheuristics. It is 
related to other high-order extensions of Evolutionary Computation such as 
the Memetic Algorithm (Section 4.6). 

4.5.2 Inspiration 

The Cultural Algorithm is inspired by the principle of cultural evolution. 
Culture includes the habits, knowledge, beliefs, customs, and morals of a 
member of society. Culture does not exist independent of the environment, 
and can interact with the environment via positive or negative feedback 
cycles. The study of the interaction of culture in the environment is referred 
to as Cultural Ecology. 

4.5.3 Metaphor 

The Cultural Algorithm may be explained in the context of the inspiring" 
system. As the evolutionary process unfolds, individuals accumulate infor- 
mation about the world which is communicated to other individuals in the 
population. Collectively this corpus of information is a knowledge base that 
members of the population may tap-into and exploit. Positive feedback 
mechanisms can occur where cultural knowledge indicates useful areas of 
the environment, information which is passed down between generations, 
exploited, refined, and adapted as situations change. Additionally, areas of 
potential hazard may also be communicated through the cultural knowledge 
base. 

4.5.4 Strategy 

The information processing objective of the algorithm is to improve the 
learning or convergence of an embedded search technique (typically an 
evolutionary algorithm) using a higher-order cultural evolution. The algo- 
rithm operates at two levels: a population level and a cultural level. The 
population level is like an evolutionary search, where individuals repre- 
sent candidate solutions, are mostly distinct and their characteristics are 
translated into an objective or cost function in the problem domain. The 
second level is the knowledge or believe space where information acquired 
by generations is stored, and which is accessible to the current generation. 
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A communication protocol is used to allow the two spaces to interact and 
the types of information that can be exchanged. 



4.5.5 Procedure 

The focus of the algorithm is the KnowledgeBase data structure that records 
different knowledge types based on the nature of the problem. For example, 
the structure may be used to record the best candidate solution found as well 
as generalized information about areas of the search space that are expected 
to payoff (result in good candidate solutions). This cultural knowledge is 
discovered by the population-based evolutionary search, and is in turn used 
to influence subsequent generations. The acceptance function constrain the 
communication of knowledge from the population to the knowledge base. 

Algorithm 4.5.1 provides a pseudocode listing of the Cultural Algorithm. 
The algorithm is abstract, providing flexibility in the interpretation of 
the processes such as the acceptance of information, the structure of the 
knowledge base, and the specific embedded evolutionary algorithm. 



Algorithm 4.5.1: Pseudocode for the Cultural Algorithm. 

Input: Problem gizei Populatiorinurn 
Output: KnowledgeBase 

1 Population ^ InitializePopulation(Pro6/emsj2;e, 
Populatiorinum)] 

2 KnowledgeBase ^ InitializeKnowledgebase (Prob/em^i^ej 
Populatiorijium^] 

3 while -iStopConditionO do 

4 Evaluate (Population) ; 

5 Situational Knowledge candidate ^ 
AcceptSituationalKnowledge (Population) ; 
UpdateSituationalKnowledge (KnowledgeBase, 

SituationalKnowledge^andidate ) j 
Children ReproduceWithlnfluence (Population, 
KnowledgeBase) ; 

8 Population Select (Children, Population); 

9 NormativeKnowledgecandidate ^ 
AcceptNormativeKnowledge (Population) ; 

10 UpdateNormativeKnowledge (KnowledgeBase, 

NormativeKnowledgecandidate) \ 

11 end 

12 return KnowledgeBase; 
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4.5.6 Heuristics 

• The Cultural Algorithm was initially used as a simulation tool to 
investigate Cultural Ecology. It has been adapted for use as an 
optimization algorithm for a wide variety of domains not-limited to 
constraint optimization, combinatorial optimization, and continuous 
function optimization. 

• The knowledge base structure provides a mechanism for incorporating 
problem-specific information into the execution of an evolutionary 
search. 

• The acceptance functions that control the flow of information into 
the knowledge base are typically greedy, only including the best 
information from the current generation, and not replacing existing 
knowledge unless it is an improvement. 

• Acceptance functions are traditionally deterministic, although proba- 
bilistic and fuzzy acceptance functions have been investigated. 

4.5.7 Code Listing 

Listing 4.4 provides an example of the Cultural Algorithm implemented 
in the Ruby Programming Language. The demonstration problem is an 
instance of a continuous function optimization that seeks min/(x) where 
/ = J27=i ^ii —5.0 < Xi < 5.0 and n = 2. The optimal solution for this 
basin function is (vq, • • • , ^^n-i) = O-O- 

The Cultural Algorithm was implemented based on the description of the 
Cultural Algorithm Evolutionary Program (CAEP) presented by Reynolds 
[4] . A real- valued Genetic Algorithm was used as the embedded evolutionary 
algorithm. The overall best solution is taken as the 'situational' cultural 
knowledge, whereas the bounds of the top 20% of the best solutions each 
generation are taken as the 'normative' cultural knowledge. The situational 
knowledge is returned as the result of the search, whereas the normative 
knowledge is used to influence the evolutionary process. Specifically, vector 
bounds in the normative knowledge are used to define a subspace from which 
new candidate solutions are uniformly sampled during the reproduction 
step of the evolutionary algorithm's variation mechanism. A real- valued 
representation and a binary tournament selection strategy are used by the 
evolutionary algorithm. 

def objective_function(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)> 
end 

def rand_in_bounds (min, max) 

return min + ( (max-min) * randO) 
end 
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def random_vector (minmax) 

return Array .new(miiim£LX. size) do |i| 

rand_in_boimds (minmax [i] [0] , minmax [i] [1] ) 

end 
end 

def mutate_with_inf (candidate, beliefs, minmax) 
V = Array. new(candidate [: vector] . size) 
candidate [: vector] . each_with_index do |c,i| 

V [i] =rand_in_bounds (belief s [: normative] [i] [0] .beliefs [:normative] [i] [1]) 

v[i] = minmax[i][0] if v[i] < minmax [i] [0] 

v[i] = minmax [i] [1] if v[i] > minmax [i] [1] 
end 

return {:vector=>v} 
end 

def binary_tournament (pop) 

i, j = rand (pop. size) , rand (pop. size) 
j = rand(pop. size) while j==i 

return (pop[i] [: fitness] < pop[j] [:f itness] ) ? pop[i] : pop[j] 
end 

def initiali2e_belief space (search_space) 

belief_space = i} 

belief _space [: situational] = nil 

belief_space[: normative] = Array. nev(search_space. size) do |i| 

Array. new(search_space [i] ) 
end 

return belief_space 
end 

def update_belief space_situational ! (belief _space, best) 
curr_best = belief_space[: situational] 

if curr_best .nil? or best [: fitness] < curr_best [: fitness] 

belief _space [: situational] = best 
end 
end 

def update_belief space_normative ! (belief _space , acc) 
belief _space [: normative] . each_with_index do [bounds, i| 

bounds[0] = acc.min-C|x,y| x[: vector] [i]<=>y[: vector] [i]}[: vector] [i] 
bounds[i] = acc.max-C|x,y| x[ : vector] [i]<=>y[: vector] [i]]-[: vector] [i] 
end 
end 

def search (max_gens, search.space , pop_size, num_accepted) 

# initialize 

pop = Array. new (pop_size) { ■[: vector=>random_ vector (search_space)]- }■ 
belief _space = initialize_belief space (search_space) 

# evaluate 

pop . each-[ I c I c[:f itness] = objective_function(c [: vector] )} 
best = pop. sort{ I x,y I x[:f itness] <=> y[:f itness]}. first 

# update situational knowledge 

update_belief space_situational ! (belief _space, best) 
max.gens. times do I gen I 
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# create next generation 

children = Array . new(pop_size) do |i| 

mutate_with_inf (pop [i] , belief _space , search_space) 
end 

# evaluate 

children . each-C I c I c[:fitness] = obj ective_funct ion(c [: vector] ) }■ 
best = children. sort{ I x,y I x[:fitness] <=> y [: fitness] }. first 

# update situational knowledge 

update_belief space_situational ! (belief _space , best) 

# select next generation 

pop = Array .new(pop_size) {. binary_tournainent (children + pop) ]- 

# update normative knowledge 

pop . sort ! { I x,y I x[:fitness] <=> y[:fitness]} 
acccepted = pop [0 . . . nuiii_accepted] 

update_belief space_normative ! (belief _space , acccepted) 

# user feedback 

puts " > generation=#{gen}- , f=#-[belief _space [: situational] [: f itness] }■ ' 
end 

return belief _space [ : situational] 
end 

if __FILE__ == $0 

# problem configuration 
probleiii_size = 2 

search_space = Array. new(problem_size) { I i I [-5, +5]} 

# algorithm configuration 
max_gens = 200 
pop_size = 100 

nuiii_accepted = (pop_size*0 . 20) . round 

# execute the algorithm 

best = search (max_gens, search_space , pop_size, num_accepted) 
puts "done! Solution: f =#-Cbest [: fitness] ]■ , s=#-[best [: vector] . inspect} " 
end 

Listing 4.4: Cultural Algorithm in Ruby 



4.5.8 References 
Primary Sources 

The Cultural Algorithm was proposed by Reynolds in 1994 that combined 
the method with the Version Space Algorithm (a binary string based Genetic 
Algorithm), where generalizations of individual solutions were communicated 
as cultural knowledge in the form of schema patterns (strings of I's, O's and 
^'s, where '7^' represents a wildcard) [3]. 

Learn More 

Chung and Reynolds provide a study of the Cultural Algorithm on a 
testbed of constraint satisfaction problems [1]. Reynolds provides a detailed 
overview of the history of the technique as a book chapter that presents 
the state of the art and summaries of application areas including concept 
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learning and continuous function optimization [4]. Coello Coello and Becerra 
proposed a variation of the Cultural Algorithm that uses Evolutionary 
Programming as the embedded weak search method, for use with Multi- 
Objective Optimization problems [2]. 
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4.6 Memetic Algorithm 

Memetic Algorithm, MA. 

4.6.1 Taxonomy 

Memetic Algorithms have elements of Metaheuristics and Computational 
hitelligence. Although they have principles of Evolutionary Algorithms, they 
may not strictly be considered an Evolutionary Technique. Memetic Algo- 
rithms have functional similarities to Baldwinian Evolutionary Algorithms, 
Lamarckian Evolutionary Algorithms, Hybrid Evolutionary Algorithms, and 
Cultural Algorithms (Section 4.5). Using ideas of memes and Memetic 
Algorithms in optimization may be referred to as Memetic Computing. 

4.6.2 Inspiration 

Memetic Algorithms are inspired by the interplay of genetic evolution and 
memetic evolution. Universal Darwinism is the generalization of genes 
beyond biological-based systems to any system where discrete units of 
information can be inherited and be subjected to evolutionary forces of 
selection and variation. The term 'meme' is used to refer to a piece of 
discrete cultural information, suggesting at the interplay of genetic and 
cultural evolution. 

4.6.3 Metaphor 

The genotype is evolved based on the interaction the phenotype has with 
the environment. This interaction is metered by cultural phenomena that 
influence the selection mechanisms, and even the pairing and recombination 
mechanisms. Cultural information is shared between individuals, spreading 
through the population as memes relative to their fitness or fitness the memes 
impart to the individuals. Collectively, the interplay of the geneotype and 
the memeotype strengthen the fitness of population in the environment. 

4.6.4 Strategy 

The objective of the information processing" strategy is to exploit a popu- 
lation based global search technique to broadly locate good areas of the 
search space, combined with the repeated usage of a local search heuristic 
by individual solutions to locate local optimum. Ideally, memetic algo- 
rithms embrace the duality of genetic and cultural evolution, allowing the 
transmission, selection, inheritance, and variation of memes as well as genes. 
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4.6.5 Procedure 

Algorithm 4.6.1 provides a pseudocode listing of the Memetic Algorithm for 
minimizing a cost function. The procedure describes a simple or first order 
Memetic Algorithm that shows the improvement of individual solutions 
separate from a global search, although does not show the independent 
evolution of memes. 



Algorithm 4.6.1: Pseudocode for the Memetic Algorithm. 
Input: ProblemSize, Popsize, MemePopsize 

Output: Sbest 

1 Population ^ InitializePopulation(ProblemSize, Popsize)'-, 

2 while -iStopConditionO do 



3 
4 
5 
6 
7 
8 



foreach Si G Population do 
I Sicost ^ Cost(5i); 
end 

Shest ^ GetBestSolution(Population) ; 

Population StochasticGlobalSearch(Population) ; 

MemeticPopulation ^ SelectMemeticPopulation(Population, 

MemePopsize)] 

foreach Si G MemeticPopulation do 

I Si ^ LocalSearchCS'j) ; 
end 



9 
10 
11 

12 end 

13 return S^esu 



4.6.6 Heuristics 

• The global search provides the broad exploration mechanism, whereas 
the individual solution improvement via local search provides an 
exploitation mechanism. 

• Balance is needed between the local and global mechanisms to ensure 
the system does not prematurely converge to a local optimum and 
does not consume unnecessary computational resources. 

• The local search should be problem and representation specific, where 
as the global search may be generic and non-specific (black-box). 

• Memetic Algorithms have been applied to a range of constraint, com- 
binatorial, and continuous function optimization problem domains. 
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4.6.7 Code Listing 

Listing 4.5 provides an example of the Memetic Algorithm implemented 
in the Ruby Programming Language. The demonstration problem is an 
instance of a continuous function optimization that seeks min/(x) where 
/ = X^iLi "^i^ —5.0 < Xi < 5.0 and n = 3. The optimal solution for this 
basin function is (vq, . . . ,Vn-i) = 0.0. The Memetic Algorithm uses a 
canonical Genetic Algorithm as the global search technique that operates 
on binary strings, uses tournament selection, point mutations, uniform 
crossover and a binary coded decimal decoding of bits to real values. A 
bit climber local search is used that performs probabilistic bit flips (point 
mutations) and only accepts solutions with the same or improving fitness. 

def objective_function(vector) 

return vector . inject (0 . 0) {I sum, x| siim + (x ** 2.0)> 
end 

def random_bitstring(num_bits) 

return (0 . . . num.bits) . inj ect (" ") { I s , i I s<< ( (rand<0 . 5) ? "1" : "0")> 
end 

def decode (bitstring, search_space , bits_per_parain) 
vector = [] 

search_space . each_with_index do | bounds, i| 
off, sum = i*bits_per_parajii, 0.0 

param = bitstring [of f ... (of f +bits_per_param) ]. reverse 
par am. size . times do |j| 

sum += ( (param [j ] .chr=='l') ? 1.0 : 0.0) * (2.0 ** j.to_f) 
end 

min, max = bounds 

vector << min + ( (max-min) / ( (2 . 0**bits_per_param. to_f ) -1 . 0) ) * sum 
end 

return vector 
end 

def f itness (candidate , search._space , param_bits) 

candidate [ : vector] =decode (candidate [ : bitstring] , search_space , parajn_bits) 
candidate [: fitness] = objective_f unction(candidate [: vector] ) 

end 

def binary_tournament (pop) 

i, j = rand (pop. size) , rand (pop. size) 
j = rand(pop. size) while j==i 

return (pop [i] [: fitness] < pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def point_mutation(bitstring, rate=l . 0/bitstring. size) 
child = "" 

bitstring. size . times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<rate) ? ( (bit== ' 1 ' ) ? "0" : "1") : bit) 
end 

return child 
end 
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def crossover (parent 1, parent2, rate) 
return ""+parentl if reindO >=rate 
child = "" 

par ent 1 . size. times do |i| 

child « ((rand()<0.5) ? parent 1 [i] . chr : parent2 [i] . chr) 
end 

return child 
end 

def reproduce (selected, pop_size, p_cross, p_inut) 
children = [] 

selected. each_with_index do |pl, i| 

p2 = (i. modulo (2) ==0) ? selected[i+l] : selected[i-l] 
p2 = selected [0] if i == selected. size-1 
child = {} 

child [: bitstring] = crossover (pi [: bitstring] , p2 [ : bitstring] , p_cross) 
child [:bitstring] = point_mutation(child[:bitstring] , p_mut) 
children « child 

break if children. size >= pop.size 
end 

return children 
end 

def bitclimber (child, search_ space , p_mut, max_local_gens, bits_per_param) 
current = child 
max_local_gens. times do 

candidate = {} 

candidate [: bitstring] = point _iiiutation(current [: bitstring] , p_mut) 
fitness (candidate, search_space , bits_per_param) 
current = candidate if candidate [: fitness] <= current [:fitness] 
end 

return current 
end 

def search (max_gens , search_space , pop_size, p_cross, p_mut, 
inax_local_gens , p_local, bits_per_param=16) 
pop = Array. new (pop_size) do |i| 

{ :bitstring=>random_bitstring(search_space. size*bits_per_param))• 
end 

pop . each{ I candidate I fitness (candidate, sesirch.space , bits_per_param) } 
gen, best = 0, pop.sort{|x,y| x[:fitness] <=> y[: fitness]}. first 
max_gens . times do |gen| 

selected = Array.new(pop_size)-[| i| binary _tournament (pop)} 
children = reproduce (selected, pop_size, p.cross, p_mut) 
children. each{ I ceind I fitness (ceind, search_space , bits_per_param)} 
pop = [] 

children. each do | child | 
if randO < p_local 

child = bitclimber (child, search.space , p_mut, max_local_gens, 
bits_per_param) 

end 

pop « child 
end 

pop. sort ! -[ I x,y I x[:fitness] <=> y[:fitness]} 

best = pop. first if pop. first [: fitness] best [: fitness] 
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puts ">gen=#-[gen>, f =#{best [: fitness] } , b=#{best [ : bitstring] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
probleiii_size = 3 

search_space = Array . new(problem_size) { I i I [-5, +5]} 

# algorithm configuration 
max_gens = 100 
pop_size = 100 

p_cross = 0.98 

p_mut = 1 . 0/ (problem_size*16) . to_f 
max_local_gens = 20 
p_local = 0.5 

# execute the algorithm 

best = search (max_gens , search_space , pop_size, p_cross, p_mut , 

max_local_gens , p_local) 
puts "done! Solution: f =#{best [: fitness] }■ , b=#-Cbest [: bitstring] }• , 

v=#{best [ : vector] . inspect} " 

end 

Listing 4.5: Memetic Algorithm in Ruby 



4.6.8 References 

Primary Sources 

The concept of a Memetic Algorithm is credited to Moscato [5], who was 
inspired by the description of meme's in Dawkins' "The Selfish Gene" [1]. 
Moscato proposed Memetic Algorithms as the marriage between population 
based global search and heuristic local search made by each individual with- 
out the constraints of a genetic representation and investigated variations 
on the Traveling Salesman Problem. 



Learn More 

Moscato and Cotta provide a gentle introduction to the field of Memetic 
Algorithms as a book chapter that covers formal descriptions of the approach, 
a summary of the fields of application, and the state of the art [6]. An 
overview and classification of the types of Memetic Algorithms is presented 
by Ong et al. who describe a class of adaptive Memetic Algorithms [7]. 
Krasnogor and Smith also provide a taxonomy of Memetic Algorithms, 
focusing on the properties needed to design 'competent' implementations 
of the approach with examples on a number of combinatorial optimization 
problems [4] . Work by Krasnogor and Gustafson investigate what they refer 
to as 'self-generating' Memetic Algorithms that use the memetic principle to 
co-evolve the local search applied by individual solutions [3]. For a broader 
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overview of the field, see the 2005 book "Recent Advances in Memetic 
Algorithms" that provides an overview and a number of studies [2]. 
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Chapter 5 

Probabilistic Algorithms 



5.1 Overview 

This chapter describes Probabilistic Algorithms 

5.1.1 Probabilistic Models 

Probabilistic Algorithms are those algorithms that model a problem or 
search a problem space using an probabilistic model of candidate solutions. 
Many Metaheuristics and Computational Intelligence algorithms may be 
considered probabilistic, although the diflFerence with algorithms is the 
explicit (rather than implicit) use of the tools of probability in problem 
solving. The majority of the algorithms described in this Chapter are 
referred to as Estimation of Distribution Algorithms. 

5.1.2 Estimation of Distribution Algorithms 

Estimation of Distribution Algorithms (EDA) also called Probabilistic 
Model-Building Genetic Algorithms (PMBGA) are an extension of the 
field of Evolutionary Computation that model a population of candidate 
solutions as a probabilistic model. They generally involve iterations that 
alternate between creating candidate solutions in the problem space from 
a probabilistic model, and reducing a collection of generated candidate 
solutions into a probabilistic model. 

The model at the heart of an EDA typically provides the probabilistic 
expectation of a component or component configuration comprising part 
of an optimal solution. This estimation is typically based on the observed 
frequency of use of the component in better than average candidate solutions. 
The probabihstic model is used to generate candidate solutions in the 
problem space, typically in a component- wise or step- wise manner using a 
domain specific construction method to ensure validity. 
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Pelikan et al. provide a comprehensive summary of the field of prob- 
abihstic optimization algorithms, summarizing the core approaches and 
their differences [10]. The edited volume by Pelikan, Sastry, and Cantu-Paz 
provides a collection of studies on the popular Estimation of Distribution 
algorithms as well as methodology for designing algorithms and applica- 
tion demonstration studies [13]. An edited volume on studies of EDAs by 
Larranaga and Lozano [4] and the follow-up volume by Lozano et al. [5] 
provide an applied foundation for the field. 

5.1.3 Extensions 

There are many other algorithms and classes of algorithm that were not 
described from the field of Estimation of Distribution Algorithm, not limited 
to: 

• Extensions to UMDA: Extensions to the Univariate Marginal Dis- 
tribution Algorithm such as the Bivariate Marginal Distribution Al- 
gorithm (BMDA) [11, 12] and the Factorized Distribution Algorithm 
(FDA) [7]. 

• Extensions to cGA: Extensions to the Compact Genetic Algorithm 
such as the Extended Compact Genetic Algorithm (ECGA) [2, 3]. 

• Extensions to BOA: Extensions to the Bayesian Optimization Al- 
gorithm such as the Hierarchal Bayesian Optimization Algorithm 
(hBOA) [8, 9] and the Incremental Bayesian Optimization Algorithm 
(iBOA) [14]. 

• Bayesian Network Algorithms: Other Bayesian network algo- 
rithms such as The Estimation of Bayesian Network Algorithm [1], 
and the Learning Factorized Distribution Algorithm (LFDA) [6]. 

• PIPE: The Probabilistic Incremental Program Evolution that uses 
EDA methods for constructing programs [16]. 

• SHCLVND: The Stochastic Hill-Climbing with Learning by Vectors 
of Normal Distributions algorithm [15]. 
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5.2 Population-Based Incremental Learning 

Population-Based Incremental Learning, PBIL. 

5.2.1 Taxonomy 

Population-Based Incremental Learning is an Estimation of Distribution 
Algorithm (EDA), also referred to as Population Mo del- Building Genetic 
Algorithms (PMBGA) an extension to the field of Evolutionary Computation. 
PBIL is related to other EDAs such as the Compact Genetic Algorithm 
(Section 5.4), the Probabilistic Incremental Programing Evolution Algorithm, 
and the Bayesian Optimization Algorithm (Section 5.5). The fact the the 
algorithm maintains a single prototype vector that is updated competitively 
shows some relationship to the Learning Vector Quantization algorithm 
(Section 8.5). 

5.2.2 Inspiration 

Population-Based Incremental Learning is a population-based technique 
without an inspiration. It is related to the Genetic Algorithm and other Evo- 
lutionary Algorithms that are inspired by the biological theory of evolution 
by means of natural selection. 

5.2.3 Strategy 

The information processing objective of the PBIL algorithm is to reduce the 
memory required by the genetic algorithm. This is done by reducing the 
population of a candidate solutions to a single prototype vector of attributes 
from which candidate solutions can be generated and assessed. Updates 
and mutation operators are also performed to the prototype vector, rather 
than the generated candidate solutions. 

5.2.4 Procedure 

The Population-Based Incremental Learning algorithm maintains a real- 
valued prototype vector that represents the probability of each component 
being expressed in a candidate solution. Algorithm 5.2.1 provides a pseu- 
docode listing of the Population-Based Incremental Learning algorithm for 
maximizing a cost function. 

5.2.5 Heuristics 

• PBIL was designed to optimize the probability of components from 
low cardinality sets, such as bit's in a binary string. 



204 



Chapter 5. Probabilistic Algorithms 



Algorithm 5.2.1: Pseudocode for PBIL. 



Input: BitSnum, 

Output: Sbest 

1 V <— InitializeVectoT CBitSnum 

2 Sbest <r- 0; 

3 while -iStopConditionO do 



SttTfiplcSfiumi LettrHfatef Pmutationi ^^-tU'tCbtion factor 



); 



4 
5 
6 

7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 

18 
19 



S current 0) 

for i to SampleSnum 

do 

Si -f- GenerateSamples(y); 
if Castes'^) < Cost iScurrent) then 

S cur rent Si^ 

if Cost (Si) < Cost ((Sfeest) then 



Sbest ^ S, 



end 
end 



end 

foreach ^^-^ 



G S, 



current 



do 



Vbit ^ Vbu X (1-0 - Learrirate) + 



X Learn 



ratei 



if RandO < Pmutation then 

^bit ^ ^bit ^ (1-0 - Mutatioufactor) + RandO x 

Mutation factor', 

end 



end 

20 end 

21 return Sbest] 



The algorithm has a very small memory footprint (compared to some 
population-based evolutionary algorithms) given the compression of 
information into a single prototype vector. 



Extensions to PBIL have been proposed that extend the representation 
beyond sets to real- valued vectors. 



• Variants of PBIL that were proposed in the original paper include up- 
dating the prototype vector with more than one competitive candidate 
solution (such as an average of top candidate solutions), and mov- 
ing the prototype vector away from the least competitive candidate 
solution each iteration. 



• Low learning rates are preferred, such as 0.1. 
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5.2.6 Code Listing 

Listing 5.1 provides an example of the Population-Based Incremental Learn- 
ing algorithm implemented in the Ruby Programming Language. The 
demonstration problem is a maximizing binary optimization problem called 
OneMax that seeks a binary string of unity (all '1' bits). The objective 
function only provides an indication of the number of correct bits in a 
candidate string, not the positions of the correct bits. The algorithm is an 
implementation of the simple PBIL algorithm that updates the prototype 
vector based on the best candidate solution generated each iteration. 

def onemax (vector) 

return vector . inject (0)-[ I sum, value | sum + value} 
end 

def generate_candidate (vector) 
candidate = {}• 

candidate [: bitstring] = Array . new(vector . size) 
vector . each_with_index do |p, i| 

candidate [: bitstring] [i] = (rand()<p) ? 1 : 0 
end 

return candidate 
end 

def update_vector (vector , current. Irate) 

vector . each_with_index do |p, i| 

vector[i] = p* (1 . 0-lrate) + current [: bitstring] [i] *lr ate 

end 
end 

def mutate_vector (vector , current, coefficient, rate) 
vector . each_with_index do |p, i| 
if randO < rate 

vector[i] = p* (1 . 0-coef f icient) + randO *coef f icient 
end 
end 
end 

def search (num_bits , max_iter, num_samples, p_mutate, mut_f actor, l_rate) 
vector = Array . new(num_bits) -[0 . 5}- 
best = nil 

max_iter . times do |iter| 
current = nil 
num_samples . times do 

candidate = generate_candidate (vector) 

candidate [: cost] = onemax (candidate [: bitstring] ) 

current = candidate if current. nil? or candidate [: cost] >current [: cost] 
best = candidate if best. nil? or candidate [: cost] >best [: cost] 
end 

update_vector (vector , current, l_rate) 
mutate_vector (vector , current, mut_f actor, p_mutate) 
puts " >iteration=#{iter]- , f =#-[best [ : cost] }- , s=#{best [: bitstring] ]•" 
break if best [: cost] == num_bits 
end 

return best 
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end 

if __FILE__ == $0 

# problem configuration 
num_bits = 64 

# algorithm configuration 
inax_iter = 100 
num_samples = 100 
p_mutate = 1 . 0/num_bits 
inut_f actor = 0.05 
l_rate =0.1 

# execute the algorithm 

best=search(num_bits , max_iter, num_samples , p_mutate, mut_f actor, l_rate) 
puts "done! Solution: f =#-[best [ : cost] }-/#-[nuiii_bits}- , s=#-[best [ : bitstr ing] }■ " 
end 

Listing 5.1: Population-Based Incremental Learning in Ruby 



5.2.7 References 

Primary Sources 

The Population-Based Licremental Learning algorithm was proposed by 
Baluja in a technical report that proposed the base algorithm as well as a 
number of variants inspired by the Learning Vector Quantization algorithm 

[!]• 

Learn More 

Baluja and Caruana provide an excellent overview of PBIL and compare 
it to the standard Genetic Algorithm, released as a technical report [3] 
and later published [4]. Baluja provides a detailed comparison between 
the Genetic algorithm and PBIL on a range of problems and scales in 
another technical report [2]. Greene provided an excellent account on the 
applicability of PBIL as a practical optimization algorithm [5]. Hohfeld and 
Rudolph provide the first theoretical analysis of the technique and provide 
a convergence proof [6]. 
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5.3 Univariate Marginal Distribution Algorithm 

Univariate Marginal Distribution Algorithm, UMDA, Univariate Marginal 
Distribution, UMD. 

5.3.1 Taxonomy 

The Univariate Marginal Distribution Algorithm belongs to the field of Es- 
timation of Distribution Algorithms (EDA) , also referred to as Population 
Model- Building Genetic Algorithms (PMBGA), an extension to the field of 
Evolutionary Computation. UMDA is closely related to the Factorized Dis- 
tribution Algorithm (FDA) and an extension called the Bivariate Marginal 
Distribution Algorithm (BMDA). UMDA is related to other EDAs such as 
the Compact Genetic Algorithm (Section 5.4), the Population-Based Incre- 
mental Learning algorithm (Section 5.2), and the Bayesian Optimization 
Algorithm (Section 5.5). 

5.3.2 Inspiration 

Univariate Marginal Distribution Algorithm is a population technique- 
based without an inspiration. It is related to the Genetic Algorithm and 
other Evolutionary Algorithms that are inspired by the biological theory of 
evolution by means of natural selection. 

5.3.3 Strategy 

The information processing strategy of the algorithm is to use the frequency 
of the components in a population of candidate solutions in the construction 
of new candidate solutions. This is achieved by first measuring the frequency 
of each component in the population (the univariate marginal probabil- 
ity) and using the probabilities to influence the probabilistic selection of 
components in the component- wise construction of new candidate solutions. 

5.3.4 Procedure 

Algorithm 5.3.1 provides a pseudocode listing of the Univariate Marginal 
Distribution Algorithm for minimizing a cost function. 

5.3.5 Heuristics 

• UMDA was designed for problems where the components of a solution 
are independent (linearly separable). 

• A selection method is needed to identify the subset of good solutions 
from which to calculate the univariate marginal probabilities. Many 
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Algorithm 5.3.1: Pseudocode for the UMDA. 



Input: BitSnum, Population size ^ SelectioUsize 

Output: Sbest 

1 Population ^ InitializePopulation(5zts„^(^, Populationsize)'-, 

2 EvaluatePopulation(Population) ; 

3 Shest ^ GetBestSolution(Population) ; 

4 while -iStopConditionO do 

5 
6 
7 
8 
9 
10 
11 
12 
13 



Selected SelectFitSolutions (Population, Selection size)] 
V -It- CalculateFrequencyOf Components (Selected) ; 
Offspring ^ 0; 
for i to Population size do 
I Offspring <— ProbabilisticallyConstructSolution(y) ; 
end 

EvaluatePopulat ion (Offspring) ; 
Shest ^ GetBestSolution(Offspring) ; 
Population ^ Offspring; 

14 end 

15 return S^est] 



selection methods from the field of Evolutionary Computation may 
be used. 



5.3.6 Code Listing 

Listing 5.2 provides an example of the Univariate Marginal Distribution Algo- 
rithm implemented in the Ruby Programming Language. The demonstration 
problem is a maximizing binary optimization problem called OneMax that 
seeks a binary string of unity (all '1' bits). The objective function provides 
only an indication of the number of correct bits in a candidate string, not 
the positions of the correct bits. 

The algorithm is an implementation of UMDA that uses the integers 
1 and 0 to represent bits in a binary string representation. A binary 
tournament selection strategy is used and the whole population is replaced 
each iteration. The mechanisms from Evolutionary Computation such as 
elitism and more elaborate selection methods may be implemented as an 
extension. 

def onemax (vector) 

return vector . inject (0) -[ I sum, value | sum + value} 
end 

def random_bitstring(size) 

return Array, new (size) -[ ( (randO <0 . 5) ? 1 : 0) > 
end 



g 

10 

11 

12 
13 

14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
3g 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 



210 



Chapter 5. Probabilistic Algorithms 



def blnary_tournanient (pop) 

i, j = randCpop. size) , rand (pop. size) 
j = rand(pop. size) while j==i 

return (pop[i] [: fitness] > pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def calculate_bit_probabilities (pop) 

vector = Array .new (pop. first [: bit string] .length, 0.0) 
pop. each do | member | 

member [:bitstring] . each_with_index {|v, i| vector [i] += v} 
end 

vector. each_with_index {|f, i| vector [i] = (f .to_f /pop. size. to_f)} 
return vector 
end 

def generate_candidate (vector) 

candidate = {}• 

candidate [:bitstring] = Array. new (vector. size) 
vector. each_with_index do |p, i| 

candidate [: bitstring] [i] = (rand()<p) ? 1 : 0 
end 

return candidate 
end 

def search(num_bits , max_iter, pop_size, select_size) 
pop = Array. new (pop_size) do 

{ : bitstr ing=>random_bitstr ing (num_bits) } 
end 

pop.each{|c| c[: fitness] = onemax (c[: bitstring] )} 

best = pop.sort{ I x,y I y[:fitness] <=> x[:f itness] }. first 

max_iter .times do |iter| 

selected = Array. new(select_size) { binary_tournEanent (pop) } 

vector = calculate_bit_probabilities (selected) 

samples = Array. new(pop_size) {. generate_candidate (vector) } 

samples . each-C I c I c[:fitness] = onemax(c [: bit string] ) )■ 

samples . sort K I X , y I y[:fitness] <=> x [: fitness] }■ 

best = samples . first if samples.first [: fitness] > best [: fitness] 

pop = samples 

puts " >iteration=#-[iter)-, f=#{best [: fitness]}, s=#{best [: bitstring]}" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
num_bits = 64 

# algorithm configuration 
max_iter = 100 
pop_size = 50 
select_size = 30 

# execute the algorithm 

best = search (num_bits , max_iter, pop_size, select_size) 
puts "done! Solution: f=#{best [: fitness]}, s=#{best [: bitstring]}" 
end 
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5.3.7 References 

Primary Sources 

The Univariate Marginal Distribution Algorithm was described by Mrihlenbein 
in 1997 in which a theoretical foundation is provided (for the field of in- 
vestigation in general and the algorithm specifically) [2]. Miihlenbein also 
describes an incremental version of UMDA (lUMDA) that is described as 
being equivalent to Baluja's Population-Based hicremental Learning (PBIL) 
algorithm [1]. 

Learn More 

Pelikan and Miihlenbein extended the approach to cover problems that 
have dependencies between the components (specifically pair-dependencies), 
referring to the technique as the Bivariate Marginal Distribution Algorithm 
(BMDA) [3, 4]. 

5.3.8 Bibliography 

[1] S. Baluja. Population-based incremental learning: A method for in- 
tegrating genetic search based function optimization and competitive 
learning. Technical Report CMU-CS-94-163, School of Computer Sci- 
ence, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, June 
1994. 

[2] H. Miihlenbein. The equation for response to selection and its use for 
prediction. Evolutionary Computation, 5(3):303-346, 1997. 

[3] M. Pelikan and H. Miihlenbein. Marginal distributions in evolutionary 
algorithms. In Proceedings of the International Conference on Genetic 
Algorithms Mendel, 1998. 

[4] M. Pelikan and H. Miihlenbein. Advances in Soft Computing: Engi- 
neering Design and Manufacturing, chapter The Bivariate Marginal 
Distribution Algorithm, pages 521-535. Springer, 1999. 
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5.4 Compact Genetic Algorithm 

Compact Genetic Algorithm, CGA, cGA. 

5.4.1 Taxonomy 

The Compact Genetic Algorithm is an Estimation of Distribution Algorithm 
(EDA), also referred to as Population Model-Building Genetic Algorithms 
(PMBGA), an extension to the field of Evolutionary Computation. The 
Compact Genetic Algorithm is the basis for extensions such as the Extended 
Compact Genetic Algorithm (ECGA). It is related to other ED As such as the 
Univariate Marginal Probability Algorithm (Section 5.3), the Population- 
Based Incremental Learning algorithm (Section 5.2), and the Bayesian 
Optimization Algorithm (Section 5.5). 

5.4.2 Inspiration 

The Compact Genetic Algorithm is a probabilistic technique without an 
inspiration. It is related to the Genetic Algorithm and other Evolutionary 
Algorithms that are inspired by the biological theory of evolution by means 
of natural selection. 

5.4.3 Strategy 

The information processing objective of the algorithm is to simulate the 
behavior of a Genetic Algorithm with a much smaller memory footprint 
(without requiring a population to be maintained). This is achieved by 
maintaining a vector that specifies the probability of including each com- 
ponent in a solution in new candidate solutions. Candidate solutions are 
probabilistically generated from the vector and the components in the better 
solution are used to make small changes to the probabilities in the vector. 

5.4.4 Procedure 

The Compact Genetic Algorithm maintains a real-valued prototype vector 
that represents the probability of each component being expressed in a 
candidate solution. Algorithm 5.4.1 provides a pseudocode listing of the 
Compact Genetic Algorithm for maximizing a cost function. The parameter 
n indicates the amount to update probabilities for conflicting bits in each 
algorithm iteration. 

5.4.5 Heuristics 

• The vector update parameter (n) influences the amount that the 
probabilities are updated each algorithm iteration. 
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Algorithm 5.4.1: Pseudocode for the cGA. 



Input: BitSnum, n 

Output: Sbest 

V •(— InitializeVector (Sits^u^, 0.5): 

^best ^ 0; 

while -iStopConditionO do 
5"! ^ GenerateSamples (y) ; 
S2 ^ GenerateSamples(F) ; 

Swinner-i Closer ^ — SelectWinnerAndLoser (S*! , »S'2); 

if <Zo St, {Syjinner^ < Cost (5';,esi ) then 



1 

2 
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4 
5 
6 
7 
8 
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18 



Sbest ^ S' 



winner > 



end 

for i to BitSnum do 



if 9* 

'-^winner 



+ Closer then 



if 

winner 



= 1 then 



else 



end 
end 



end 

19 end 

20 return SbesU 



• The vector update parameter (n) may be considered to be comparable 
to the population size parameter in the Genetic Algorithm. 

• Early results demonstrate that the cGA may be comparable to a 
standard Genetic Algorithm on classical binary string optimization 
problems (such as OneMax). 

• The algorithm may be considered to have converged if the vector 
probabilities are all either 0 or 1. 

5.4.6 Code Listing 

Listing 5.3 provides an example of the Compact Genetic Algorithm imple- 
mented in the Ruby Programming Language. The demonstration problem 
is a maximizing binary optimization problem called OneMax that seeks a 
binary string of unity (all '1' bits). The objective function only provides an 
indication of the number of correct bits in a candidate string, not the posi- 
tions of the correct bits. The algorithm is an implementation of Compact 
Genetic Algorithm that uses integer values to represent 1 and 0 bits in a 
binary string representation. 
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def onemax (vector) 

return vector . inject (0){ I sum, value | sum + value} 
end 

def generate_candidate (vector) 
candidate = {} 

candidate [:bitstring] = Array. new (vector. size) 

vector . each_with_index do |p, i| 

candidate [: bitstring] [i] = (rand()<p) ? 1 : 0 
end 

candidate [: cost] = onemax(candidate [: bitstring] ) 
return ceindidate 
end 

def update_vector (vector , winner, loser, pop_size) 
vector. size. times do |i| 

if winner [: bitstring] [i] != loser [: bitstring] [i] 
if winner [: bitstring] [i] ~ 1 

vector [i] += 1.0/pop_size.to_f 
else 

vector [i] -= 1 . 0/pop_size.to_f 
end 
end 
end 
end 

def search (num_bits, maix.iterations, pop_size) 
vector = Array. new (num_bits) {0.5} 
best = nil 

inax_iterations . times do |iter| 
cl = generate_candidate (vector) 
c2 = generate_candidate (vector) 

winner, loser = (cl[:cost] > c2[:cost] ? [cl,c2] : [c2,cl]) 
best = winner if best. nil? or winner [: cost] >best [: cost] 
update_vector (vector, winner, loser, pop_size) 

puts " >iteration=#{iter}, f =#{best [: cost] }, s=#{best [: bitstring]}" 
break if best [: cost] == num_bits 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
num_bits = 32 

# algorithm configuration 
inax_iterations = 200 
pop_size = 20 

# execute the algorithm 

best = search(num_bits , max_iterations , pop_size) 

puts "done! Solution: f =#-[best [ : cost] }/#{num_bits} , s=#{best [: bitstring] } " 
end 
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5.4.7 References 

Primary Sources 

The Compact Genetic Algorithm was proposed by Harik, Lobo, and Gold- 
berg in 1999 [3], based on a random walk model previously introduced by 
Harik et al. [2]. In the introductory paper, the cGA is demonstrated to be 
comparable to the Genetic Algorithm on standard binary string optimization 
problems. 

Learn More 

Harik et al. extended the Compact Genetic Algorithm (called the Extended 
Compact Genetic Algorithm) to generate populations of candidate solu- 
tions and perform selection (much like the Univariate Marginal Probabilist 
Algorithm), although it used Marginal Product Models [1, 4]. Sastry and 
Goldberg performed further analysis into the Extended Compact Genetic 
Algorithm applying the method to a complex optimization problem [5]. 

5.4.8 Bibliography 

[1] G. R. Harik. Linkage learning via probabilistic modeling in the extended 
compact genetic algorithm (ECGA). Technical Report 99010, Illinois 
Genetic Algorithms Laboratory, Department of General Engineering, 
University of Illinois, 1999. 

[2] G. R. Harik, E. Cantu-Paz, D. E. Goldberg, and B. L. Miher. The 
gambler's ruin problem, genetic algorithms, and the sizing of populations. 
In IEEE International Conference on Evolutionary Computation, pages 
7-12, 1997. 

[3] G. R. Harik, F. G. Lobo, and D. E. Goldberg. The compact genetic 
algorithm. IEEE Transactions on Evolutionary Computation, 3(4): 287- 
297, 1999. 

[4] G. R. Harik, F. G. Lobo, and K. Sastry. Scalable Optimization via Prob- 
abilistic Modeling, chapter Linkage Learning via Probabilistic Modeling 
in the Extended Compact Genetic Algorithm (ECGA), pages 39-61. 
Springer, 2006. 

[5] K. Sastry and D. E. Goldberg. On extended compact genetic algorithm. 
In Late Breaking Paper in Genetic and Evolutionary Computation Con- 
ference, pages 352-359, 2000. 



CiJ!. 



216 



Chapter 5. Probabilistic Algorithms 



5.5 Bayesian Optimization Algorithm 

Bayesian Optimization Algorithm, BOA. 

5.5.1 Taxonomy 

The Bayesian Optimization Algorithm belongs to the field of Estimation 
of Distribution Algorithms, also referred to as Population Model- Building 
Genetic Algorithms (PMBGA) an extension to the field of Evolutionary 
Computation. More broadly, BOA belongs to the field of Computational 
Intelligence. The Bayesian Optimization Algorithm is related to other 
Estimation of Distribution Algorithms such as the Population Incremental 
Learning Algorithm (Section 5.2), and the Univariate Marginal Distribution 
Algorithm (Section 5.3). It is also the basis for extensions such as the 
Hierarchal Bayesian Optimization Algorithm (hBOA) and the Incremental 
Bayesian Optimization Algorithm (iBOA). 

5.5.2 Inspiration 

Bayesian Optimization Algorithm is a technique without an inspiration. 
It is related to the Genetic Algorithm and other Evolutionary Algorithms 
that are inspired by the biological theory of evolution by means of natural 
selection. 

5.5.3 Strategy 

The information processing objective of the technique is to construct a 
probabilistic model that describes the relationships between the components 
of fit solutions in the problem space. This is achieved by repeating the 
process of creating and sampling from a Bayesian network that contains 
the conditional dependancies, independencies, and conditional probabilities 
between the components of a solution. The network is constructed from 
the relative frequencies of the components within a population of high 
fitness candidate solutions. Once the network is constructed, the candidate 
solutions are discarded and a new population of candidate solutions are 
generated from the model. The process is repeated until the model converges 
on a fit prototype solution. 

5.5.4 Procedure 

Algorithm 5.5.1 provides a pseudocode listing of the Bayesian Optimization 
Algorithm for minimizing a cost function. The Bayesian network is con- 
structed each iteration using a greedy algorithm. The network is assessed 
based on its fit of the information in the population of candidate solutions 
using either a Bayesian Dirichlet Metric (BD) [9] , or a Bayesian Information 
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Criterion (BIC). Refer to Chapter 3 of Pelikan's book for a more detailed 
presentation of the pseudocode for BOA [5]. 

Algorithm 5.5.1: Pseudocode for BOA. 

Input: BitSnumi Population size, SelectioUsize 
Output: Sbest 

1 Population ^ InitializePopulation(5zi5„M^, Population si zey, 

2 EvaluatePopulation(Population) ; 

3 Shest ^ GetBestSolution(Population) ; 

4 while -"StopConditionO do 

5 Selected ^ SelectFitSolutions (Population, Selectiongize)] 

6 Model ^ ConstructBayesianNetwork (Selected ) ; 

7 Offspring ^ 0; 

8 for i to Population size do 

9 I Offspring ^ ProbabilisticallyConstructSolution(Model) ; 

10 end 

11 EvaluatePopulat ion (Offspring) ; 

12 Si)f.st ^ GetBestSolution(Offspring) ; 

13 Population ^ Combine (Population, Offspring); 

14 end 

15 return Shest] 



5.5.5 Heuristics 

• The Bayesian Optimization Algorithm was designed and investigated 
on binary string-base problems, most commonly representing binary 
function optimization problems. 

• Bayesian networks are typically constructed (grown) from scratch each 
iteration using an iterative process of adding, removing, and reversing 
links. Additionally, past networks may be used as the basis for the 
subsequent generation. 

• A greedy hill-climbing algorithm is used each algorithm iteration to 
optimize a Bayesian network to represent a population of candidate 
solutions. 

• The fitness of constructed Bayesian networks may be assessed using 
the Bayesian Dirichlet Metric (BD) or a Minimum Description length 
method called the Bayesian hiformation Criterion (BIC). 

5.5.6 Code Listing 

Listing 5.4 provides an example of the Bayesian Optimization Algorithm 
implemented in the Ruby Programming Language. The demonstration 
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problem is a maximizing binary optimization problem called OneMax that 
seeks a binary string of unity (all '1' bits). The objective function provides 
only an indication of the number of correct bits in a candidate string, not 
the positions of the correct bits. 

The Bayesian Optimization Algorithm can be tricky to implement given 
the use of of a Bayesian Network at the core of the technique. The implemen- 
tation of BOA provided is based on the the C++ implementation provided 
by Pelikan, version 1.0 [3]. Specifically, the implementation uses the K2 
metric to construct a Bayesian network from a population of candidate 
solutions [1]. Essentially, this metric is a greedy algorithm that starts with 
an empty graph and adds the arc with the most gain each iteration until 
a maximum number of edges have been added or no further edges can be 
added. The result is a directed acyclic graph. The process that constructs 
the graph imposes limits, such as the maximum number of edges and the 
maximum number of in-bound connections per node. 

New solutions are sampled from the graph by first topologically ordering 
the graph (so that bits can be generated based on their dependencies), then 
probabilistically sampling the bits based on the conditional probabilities 
encoded in the graph. The algorithm used for sampling the conditional 
probabilities from the network is Probabilistic Logic Sampling [2]. The 
stopping condition is either the best solution for the problem is found or 
the system converges to a single bit pattern. 

Given that the implementation was written for clarity, it is slow to execute 
and provides an great opportunity for improvements and efficiencies. 

def onemax (vector) 

return vector . inject (0) { I sum, value | sum + value} 
end 

def random_bitstring(size) 

return Array . new(size) -[ ( (randO <0 . 5) ? 1 : 0) } 
end 

def path_exists? (i , j, graph) 
visited, stack = [] , [i] 
while ! stack. empty? 

return true if stack. include? (j ) 

k = stack. shift 

next if visited. include? (k) 

visited << k 

graph [k] [: out] . each {|m| stack. unshift (m) if ! visited. include? (m)} 
end 

return false 
end 

def can_add_edge? (i , j, graph) 

return Igraph [i] [: out] . include? (j ) && !path_exists? ( j , i, graph) 
end 

def get_viable_parents (node , graph) 
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viable = [] 

graph. size. times do |i| 

if node!=i and caii_add_edge? (node , i, graph) 
viable « i 

end 
end 

return viable 
end 

def compute_count_f or_edges(pop, indexes) 
counts = Array . new(2** (indexes . size) ) {0} 
pop . each do I p I 
index = 0 

indexes. reverse. each_with_index do |v,i| 

index += ((p[:bitstring] [v] == 1) ? 1 : 0) * (2**i) 
end 

counts [index] += 1 
end 

return counts 
end 

def fact (v) 

return v <= 1 ? 1 : v*fact(v-l) 
end 

def k2equation(node , candidates, pop) 

counts = compute_count_f or_edges(pop, [node]+candidates) 

total = nil 

(counts . size/2) . times do |i| 
al, a2 = counts [i*2], counts [(i*2)+l] 

rs = (1.0/fact((al+a2)+l) .to_f ) * fact(al).to_f * f act (a2) .to_f 
total = (total. nil? ? rs : total*rs) 
end 

return total 
end 

def compute_gains (node , graph, pop, max=2) 

viable = get_viable_parents(node [:num] , graph) 
gains = Array. new (graph. size) {-1.0} 
gains . each_ index do |i| 

if graph [i] [:in] .size < max and viable. include? (i) 

gains [i] = k2equation(node[:nu]ii] , node[: in] + [i] , pop) 
end 
end 

return gains 
end 

def construct_net work (pop, prob_size, max_edges=3*pop. size) 

graph = Array .new(prob_size) {|i| {:out=>[], :in=>[], :nuin=>i)- } 
gains = Array .new(prob_size) 
max_edges . times do 

max, from, to = -1, nil, nil 
graph. each_with_ index do (node, i| 

gains [i] = compute_gains (node , graph, pop) 

gains [i] . each_with_index {|v,j| from, to, max = i,j,v if v>max} 
end 
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break if max <= 0.0 

graph [from] [: out] << to 
graph [to] [: in] << from 
end 

return graph 
end 

def topologicEd_ordering (graph) 

graph. each {|n| n[:count] = n[:in].size} 
ordered, stack = [] , graph, select {In] n[: couiit]==0]- 
while ordered. size < graph. size 
current = stack. shift 
current [ : out] . each do | edge | 
node = graph. find {|n| n[:num]==edge} 
node [: count] -= 1 

stack « node if node [: count] <= 0 
end 

ordered << current 
end 

return ordered 
end 

def marginal_probability(i, pop) 

return pop. inject (0. 0){ I s,x| s + x[:bitstring] [i]} / pop. size. to_f 
end 

def calculate_probability (node, bitstring, graph, pop) 

return marginal_probability(node [ : num] , pop) if node [: in] . empty? 
counts = compute_count_f or_edges (pop, [node [ : num] ] +node [ : in] ) 
index = 0 

node [ : in] . reverse . each_with_ index do I v, i | 

index += ( (bitstring [v] == 1) ? 1 : 0) * (2**i) 
end 

11 = index + ( 1*2** (node [: in] . size) ) 

12 = index + (0*2** (node [: in] . size) ) 

al, a2 = counts [il] . to_f , coxmts [i2] . to_f 
return al/(al+a2) 
end 

def probabilistic_logic_sample (graph, pop) 
bitstring = Array .new (graph. size) 
graph . each do | node | 

prob = calculate_probability (node , bitstring, graph, pop) 
bitstring [node [: num]] = ((randO < prob) ? 1 : 0) 
end 

return ■C:bitstring=>bitstring} 
end 

def sample_from_net work (pop, graph, num.saunples) 
ordered = topological_ordering (graph) 
samples = Array. new(num_samples) do 

probabilistic_logic_sample (ordered, pop) 
end 

return samples 
end 
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def search (num_bits , max_iter, pop_size, select_size, num_children) 

pop = Array .new(pop_si2e) {. {:bitstring=>randoiii_bitstring(num_bits)]- ]■ 
pop.each{|c| c[:cost] = onemax(c [:bitstring] )}- 
best = pop. sort K I x,y I y[:cost] <=> x [: cost] }. first 
max. it er . times do I it | 

selected = pop . f irst (select_size) 

network = construct_network(selected, num_bits) 

arcs = network . inj ect (0) -[ I s , X I s+x [: out] . size} 

children = sample_f rom_network(selected, network, num_children) 
children. each-[ I c I c[:cost] = onemax(c [:bitstring] )}- 

children, each {|c| puts " >>sample, f =#-Cc [ : cost] }• #-[c [ : bitstring] }•"}• 

pop = pop [0 . . . (pop_size-select_size) ] + children 

pop. sort! i\x,y\ y[:cost] <=> x[:cost]]- 

best = pop. first if pop . f irst [: cost] >= best[:cost] 

puts " >it=#-[it}, arcs=#-[arcs}, f =#{best [ : cost] > , [#{best [: bitstring] >] " 
converged = pop. select i\x\ x [: bitstring] ! =pop . first [: bitstring] ]-. empty? 
break if converged or best [: cost] ==num_bits 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
num_bits = 20 

# algorithm configuration 
max_iter = 100 
pop_size = 50 
select_size = 15 
num_children = 25 

# execute the algorithm 

best = search (num_bits, max_iter, pop_size, select_size, num_children) 
puts "done! Solution: f=#{best [: cost] }■/#-[ num_bits}- , s=#{best [: bitstring] ]■" 
end 

Listing 5.4: Bayesian Optimization Algorithm in Ruby 



5.5.7 References 
Primary Sources 

The Bayesian Optimization Algorithm was proposed by Pelikan, Goldberg, 
and Cantu-Paz in the technical report [8], that was later published [10]. 
The technique was proposed as an extension to the state of Estimation 
of Distribution algorithms (such as the Univariate Marginal Distribution 
Algorithm and the Bivariate Marginal Distribution Algorithm) that used a 
Bayesian Network to model the relationships and conditional probabilities 
for the components expressed in a population of fit candidate solutions. 
Pelikan, Goldberg, and Cantu-Paz also described the approach applied to 
deceptive binary optimization problems (trap functions) in a paper that 
was published before the seminal journal article [9]. 
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Learn More 

Pelikan and Goldberg described an extension to the approach cahed the Hi- 
erarchical Bayesian Optimization Algorithm (hBOA) [6, 7]. The difTerences 
in the hBOA algorithm are that it replaces the decision tables (used to store 
the probabilities) with decision graphs and used a niching method called 
Restricted Tournament Replacement to maintain diversity in the selected 
set of candidate solutions used to construct the network models. Pelikan's 
work on BOA culminated in his PhD thesis that provides a detailed treat- 
ment of the approach, its configuration and application [4]. Pelikan, Sastry, 
and Goldberg proposed the Incremental Bayesian Optimization Algorithm 
(iBOA) extension of the approach that removes the population and adds 
incremental updates to the Bayesian network [11]. 

Pelikan published a book that focused on the technique, walking through 
the development of probabilistic algorithms inspired by evolutionary compu- 
tation, a detailed look at the Bayesian Optimization Algorithm (Chapter 3), 
the hierarchic extension to Hierarchical Bayesian Optimization Algorithm 
and demonstration studies of the approach on test problems [5]. 
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5.6 Cross-Entropy Method 

Cross-Entropy Method, Cross Entropy Method, CEM. 

5.6.1 Taxonomy 

The Cross-Entropy Method is a probabihstic optimization belonging to 
the field of Stochastic Optimization. It is similar to other Stochastic 
Optimization and algorithms such as Simulated Annealing (Section 4.2), 
and to Estimation of Distribution Algorithms such as the Probabilistic 
Incremental Learning Algorithm (Section 5.2). 

5.6.2 Inspiration 

The Cross-Entropy Method does not have an inspiration. It was developed 
as an efficient estimation technique for rare-event probabilities in discrete 
event simulation systems and was adapted for use in optimization. The name 
of the technique comes from the Kullback-Leibler cross-entropy method for 
measuring the amount of information (bits) needed to identify an event 
from a set of probabilities. 

5.6.3 Strategy 

The information processing strategy of the algorithm is to sample the 
problem space and approximate the distribution of good solutions. This is 
achieved by assuming a distribution of the problem space (such as Gaussian), 
sampling the problem domain by generating candidate solutions using the 
distribution, and updating the distribution based on the better candidate 
solutions discovered. Samples are constructed step-wise (one component at 
a time) based on the summarized distribution of good solutions. As the 
algorithm progresses, the distribution becomes more refined until it focuses 
on the area or scope of optimal solutions in the domain. 

5.6.4 Procedure 

Algorithm 5.6.1 provides a pseudocode listing of the Cross- Entropy Method 
algorithm for minimizing a cost function. 

5.6.5 Heuristics 

• The Cross-Entropy Method was adapted for combinatorial optimiza- 
tion problems, although has been applied to continuous function 
optimization as well as noisy simulation problems. 

• A alpha (a) parameter or learning rate e [0.1] is typically set high, 
such as 0.7. 
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Algorithm 5.6.1: Pseudocode for the Cross-Entropy Method. 

Input: Problem size^ SampleSnum, UpdateSampleSnurm Learrirate^ 
V ariancemin 

Output: Shest 

1 Means ^ InitializeMeans 0 ; 

2 Variances ^ InitializeVariances () ; 

3 Sliest ^ 0; 

4 while Max (Variances) < V ariancemin do 

5 Samples 0; 

6 for i = 0 to SampleSnum do 

7 I Samples ^ GenerateSample (Means, Variances); 

8 end 

9 EvaluateSamples (Samples) ; 

10 SortSamplesByQuality (Samples) ; 

11 if Cost (Sampleso) < Cost (5'i,est) then 

12 I Sbest ^ Sampleso; 

13 end 

14 Samples selected ^SelectBestSamples (Samples, 
U pdateSampleSnum ) ; 

15 for i = 0 to Problemsize do 

16 Meansi ^ Meansi + Learrirate x ^^^riiSamples selected, 0; 

17 Fariances^ V ariancesi + Learrirate x 

Variance (.Samp/eSseZected, 

18 end 

19 end 

20 return Sbest] 



• A smoothing function can be used to further control the updates the 
summaries of the distribution(s) of samples from the problem space. 
For example, in continuous function optimization a ^ parameter may 
replace a for updating the standard deviation, calculated at time t as 
= ^ - /3 X (1 - where (3 is initially set high e [0.8, 0.99] and q 
is a small integer G [5, 10]. 



5.6.6 Code Listing 

Listing 5.5 provides an example of the Cross-Entropy Method algorithm 
implemented in the Ruby Programming Language. The demonstration 
problem is an instance of a continuous function optimization problem that 
seeks min/(a:) where / = X^^^i^;?, —5.0 < Xi < 5.0 and n = 3. The 
optimal solution for this basin function is (vq, ■ ■ • , Vn-i) = 0.0. 

The algorithm was implemented based on a description of the Cross- 
Entropy Method algorithm for continuous function optimization by Ru- 
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binstein and Kroese in Chapter 5 and Appendix A of their book on the 
method [5]. The algorithm maintains means and standard deviations of the 
distribution of samples for convenience. The means and standard deviations 
are initialized based on random positions in the problem space and the 
bounds of the whole problem space respectively. A smoothing parameter is 
not used on the standard deviations. 



def objective_fuiiction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)> 
end 

def random_variable (minmax) 
min, max = minmax 

return min + ((max - min) * randO) 
end 

def random_gaussian(mean=0 . 0 , stdev=1.0) 
ul = u2 = w = 0 
begin 

ul = 2 * randO - 1 

u2 = 2 * randO - 1 

w = ul * ul + u2 * u2 
end while w >= 1 

w = Math. sqrt( (-2.0 * Math.log(w)) / w) 
return mean + (u2 * w) * stdev 
end 

def generate_sample (search_space , means, stdevs) 
vector = Array . new(search_space . size) 
search_space . size . times do |i| 

vector [i] = random_gaussian(means [i] , stdevs [i] ) 

vector [i] = search_space [i] [0] if vector [i] < search 

vector [i] = search_space [i] [1] if vector [i] > search 
end 

return -[ : vector=>vector> 
end 

def mean_attr (samples , i) 

sum = samples . inj ect (0 . 0) do |s,sample| 

s + sample [: vector] [i] 
end 

return (sum / samples . size . to_f) 
end 

def stdev_attr (samples , mecin, i) 

sum = samples . inj ect (0 . 0) do |s, sample | 

s + (sample [: vector] [i] - meaii)**2.0 
end 

return Math. sqrt (sum / samples . size . to_f) 
end 

def update_distribution f (samples , alpha, means, stdevs) 
means . size . times do |i| 

means[i] = alpha*means [i] + ( (1 . 0-alpha) *mean_attr (samples , i)) 

stdevs [i] = alpha*stdevs [i] + ( ( 1 . 0-alpha) *stdev_attr (samples , means [i] , i) ) 



_space [i] [0] 
_space [i] [1] 
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end 
end 

def search (bounds, max_iter, num_samples , num_update, learning_rate) 
means = Array . new(bounds . size) -[ | i | random_variable (bounds [i] ) } 
stdevs = Array . new (bounds . size) -[ I i I bounds [i] [1] -bounds [i] [0] } 
best = nil 

max. it er . times do |iter| 

samples = Array . new(num_samples) -[generate_sample (bounds , means, stdevs)} 
samples. each -CIsampI samp[:cost] = objective_function(samp[: vector] )} 
sajnples . sort !-[|x,y| x[:cost] <=>y [ : cost] } 

best = samples . first if best. nil? or samples . first [: cost] < best[:cost] 
selected = samples . first (num_update) 

update_distribution ! (selected, learning_rate , means, stdevs) 
puts " > iteration=#-[iter} , f itness=#-[best [ : cost] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 3 

search_space = Array. new(problem_size) {|i| [-5, 5] }■ 

# algorithm configuration 
max_iter = 100 
num_samples = 50 
num_update = 5 

l_rate =0.7 

# execute the algorithm 

best = search (search_space , max_iter, num_samples, num_update, l_rate) 
puts "done! Solution: f =#{best [ : cost] } , s=#{best [: vector] . inspect}" 
end 

Listing 5.5: Cross- Entropy Method in Ruby 



5.6.7 References 
Primary Sources 

The Cross-Entropy method was proposed by Rubinstein in 1997 [2] for use 
in optimizing discrete event simulation systems. It was later generalized 
by Rubinstein and proposed as an optimization method for combinatorial 
function optimization in 1999 [3]. This work was further elaborated by 
Rubinstein providing a detailed treatment on the use of the Cross-Entropy 
method for combinatorial optimization [4]. 

Learn More 

De Boer et al. provide a detailed presentation of Cross-Entropy method 
including its application in rare event simulation, its adaptation to combi- 
natorial optimization, and example applications to the max-cut, traveling 
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salesman problem, and a clustering numeric optimization example [1]. Ru- 
binstein and Kroese provide a thorough presentation of the approach in 
their book, summarizing the relevant theory and the state of the art [5]. 
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6.1 Overview 

This chapter describes Swarm Algorithms. 

6.1.1 Swarm Intelligence 

Swarm intelHgence is the study of computational systems inspired by the 
'collective intelligence'. Collective hitelligence emerges through the coopera- 
tion of large numbers of homogeneous agents in the environment. Examples 
include schools of fish, flocks of birds, and colonies of ants. Such intelligence 
is decentralized, self-organizing and distributed through out an environment. 
In nature such systems are commonly used to solve problems such as effec- 
tive foraging for food, prey evading, or colony re-location. The information 
is typically stored throughout the participating homogeneous agents, or is 
stored or communicated in the environment itself such as through the use 
of pheromones in ants, dancing in bees, and proximity in fish and birds. 

The paradigm consists of two dominant sub-fields 1) Ant Colony Opti- 
mization that investigates probabilistic algorithms inspired by the stigmergy 
and foraging behavior of ants, and 2) Particle Swarm Optimization that 
investigates probabilistic algorithms inspired by the flocking, schooling and 
herding. Like evolutionary computation, swarm intelligence 'algorithms' or 
'strategies' are considered adaptive strategies and are typically applied to 
search and optimization domains. 

6.1.2 References 

Seminal books on the field of Swarm Intelligence include ''''Swarm Intelligence''' 
by Kennedy, Eberhart and Shi [10], and Swarm Intelligence: From Natural 
to Artificial Systems''' by Bonabeau, Dorigo, and Theraulaz [3]. Another 
excellent text book on the area is ''''Fundamentals of Computational Swarm 
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Intelligence^^ by Engelbrecht [7]. The seminal book reference for the field 
of Ant Colony Optimization is ^^Ant Colony Optimization^^ by Dorigo and 
Stiitzle [6]. 

6.1.3 Extensions 

There are many other algorithms and classes of algorithm that were not 
described from the field of Swarm hitelligence, not limited to: 

• Ant Algorithms: such as Max-Min Ant Systems [15] Rank-Based 
Ant Systems [4], Elitist Ant Systems [5], Hyper Cube Ant Colony 
Optimization [2] Approximate Nondeterministic Tree-Search (ANTS) 
[12] and Multiple Ant Colony System [8]. 

• Bee Algorithms: such as Bee System and Bee Colony Optimiza- 
tion [11], the Honey Bee Algorithm [16], and Artificial Bee Colony 
Optimization [1, 9]. 

• Other Social Insects: algorithms inspired by other social insects 
besides ants and bees, such as the Firey Algorithm [18] and the Wasp 
Swarm Algorithm [14]. 

• Extensions to Particle Swarm: such as Repulsive Particle Swarm 
Optimization [17]. 

• Bacteria Algorithms: such as the Bacteria Chemotaxis Algorithm 
[13]. 
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6.2 Particle Swarm Optimization 

Particle Swarm Optimization, PSO. 

6.2.1 Taxonomy 

Particle Swarm Optimization belongs to the field of Swarm Intelligence and 

Collective Intelligence and is a sub-field of Computational Intelligence. Par- 
ticle Swarm Optimization is related to other Swarm Intelligence algorithms 
such as Ant Colony Optimization and it is a baseline algorithm for many 
variations, too numerous to list. 

6.2.2 Inspiration 

Particle Swarm Optimization is inspired by the social foraging behavior of 
some animals such as flocking behavior of birds and the schooling behavior 
of fish. 

6.2.3 Metaphor 

Particles in the swarm fly through an environment following the fitter mem- 
bers of the swarm and generally biasing their movement toward historically 
good areas of their environment. 

6.2.4 Strategy 

The goal of the algorithm is to have all the particles locate the optima in 
a multi-dimensional hyper- volume. This is achieved by assigning initially 
random positions to all particles in the space and small initial random 
velocities. The algorithm is executed like a simulation, advancing the 
position of each particle in turn based on its velocity, the best known global 
position in the problem space and the best position known to a particle. The 
objective function is sampled after each position update. Over time, through 
a combination of exploration and exploitation of known good positions in the 
search space, the particles cluster or converge together around an optima, 
or several optima. 

6.2.5 Procedure 

The Particle Swarm Optimization algorithm is comprised of a collection 
of particles that move around the search space infiuenced by their own 
best past location and the best past location of the whole swarm or a close 
neighbor. Each iteration a particle's velocity is updated using: 
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Vi{t+1) = Vi{t) + (ci X randQ x bf"* - + 

(c2 X randO x {pgbest - Pi{t))) 

where Vi(t + 1) is the new velocity for the i^^ particle, ci and C2 are 
the weighting coefficients for the personal best and global best positions 
respectively, Pi{t) is the i*^ particle's position at time t, p^^^ is the i^^ 
particle's best known position, and Pgbest is the best position known to the 
swarm. The randQ function generate a uniformly random variable G [0, 1]. 
Variants on this update equation consider best positions within a particles 
local neighborhood at time t. 

A particle's position is updated using: 

Pi{t + 1) = Pi{t) + Vi{t) (6.1) 

Algorithm 6.2.1 provides a pseudocode listing of the Particle Swarm 
Optimization algorithm for minimizing a cost function. 

6.2.6 Heuristics 

• The number of particles should be low, around 20-40 

• The speed a particle can move (maximum change in its position per 
iteration) should be bounded, such as to a percentage of the size of 
the domain. 

• The learning factors (biases towards global and personal best positions) 
should be between 0 and 4, typically 2. 

• A local bias (local neighborhood) factor can be introduced where 
neighbors are determined based on Euclidean distance between particle 
positions. 

• Particles may leave the boundary of the problem space and may be 
penalized, be reflected back into the domain or biased to return back 
toward a position in the problem domain. Alternatively, a wrapping 
strategy may be used at the edge of the domain creating a loop, torrid 
or related geometrical structures at the chosen dimensionality. 

• An inertia or momentum coefficient can be introduced to limit the 
change in velocity. 

6.2.7 Code Listing 

Listing 6.1 provides an example of the Particle Swarm Optimization algo- 
rithm implemented in the Ruby Programming Language. The demonstration 
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Algorithm 6.2.1: Pseudocode for PSO. 



Input: ProblemSize, Population size 

Output: PgJaest 

1 Population 0; 

2 Pg.hest ^ 0; 

3 for i = 1 to Population size do 



4 
5 

6 
7 
8 
9 
10 



); 



Pvelocity <r- RandomVelocity () ; 
Pposition RandoToPositioniPopulationsizey , 

Pcost Cost (/position) ) 

PpJjest Ppositiori) 

if Pcost < PgJbest then 

I PgJbest PpJbesti 



end 

11 end 

12 while -iStopConditionO do 

13 foreach P G Population do 

14 Pvelocity ^ UpdateVelocity (P^eZocity; Pg.best, PpJjest)'-, 

15 Pposition ^ UpdatePositionC^^ogjtjo,^, Pvelocity)] 
IQ Pcost ^ Cost (J^ositioTj,) 5 

17 if Pcost < PpJbest then 

18 PpJbest -^position) 

19 if Pcost < Pg_feest then 

20 j PgJbest ^ PpJbest'i 

21 end 

22 end 

23 end 

24 end 

25 return Pgjbest] 



problem is an instance of a continuous function optimization that seeks 
min/(x) where / = X^^^^a:?, —5.0 < a;^ < 5.0 and n = 3. The optimal 
solution for this basin function is (vq, . . . ,Vn-i) = 0.0. The algorithm is a 
conservative version of Particle Swarm Optimization based on the seminal 
papers. The implementation limits the velocity at a pre-defined maximum, 
and bounds particles to the search space, reflecting their movement and 
velocity if the bounds of the space are exceeded. Particles are influenced by 
the best position found as well as their own personal best position. Natural 
extensions may consider limiting velocity with an inertia coefficient and 
including a neighborhood function for the particles. 

def objective_f unction(vector) 

return vector. inject (0.0) {|sum, x| sum + (x ** 2.0)} 
end 
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def rauidoin_vector (minmax) 

return Array .new(minmax. size) do |i| 

minmaxCi] [0] + ( (minmaxCi] [1] - miiimax [i] [0] ) * randO) 

end 
end 

def create_particle (search_space , vel_space) 
particle = {} 

particle [:position] = random_vector (search_space) 
particle [: cost] = objective_f unction(particle [: position] ) 
particle [: b_position] = Array .new(particle [: position] ) 
particle [:b_cost] = particle [: cost] 
particle [: velocity] = raiidom_vector (vel_space) 
return particle 
end 

def get_global_best (population, current_best=nil) 
population. sort{ I x,y I x[:cost] <=> y[:cost]} 
best = population. first 

if current _best .nil? or best [: cost] <= current _best [: cost] 

current_best = {]- 

current_best [: position] = Array. new (best [: position] ) 
current _best [: cost] = best [: cost] 
end 

return current _best 
end 

def update_velocity (particle , gbest, max_v, cl, c2) 
particle [: velocity] . each_with_index do |v,i| 

vl = cl * randO * (particle [:b_position] [i] - particle [: position] [i] ) 
v2 = c2 * randO * (gbest [: position] [i] - particle [iposition] [i] ) 

particle [: velocity] [i] = v + vl + v2 

particle [: velocity] [i] = max_v if particle [: velocity] [i] > iii£lx_v 
particle [: velocity] [i] = -max_v if particle [: velocity] [i] < -max_v 
end 
end 

def update.positionCpeirt , bounds) 

part [: position] . each_with_index do |v,i| 
part [:position] [i] = v + part [: velocity] [i] 
if part [: position] [i] > bounds[i][l] 
part [: position] [i] =bounds [i] [1] -(part [iposition] [i] -bounds [i] [l]).abs 
part [: velocity] [i] *= -1.0 
elsif part [: position] [i] < bounds [i][0] 

part [ : position] [i] =bounds [i] [0] + (part [ : position] [i] -bounds [i] [0] ) . abs 
pEurt [: velocity] [i] *= -1.0 
end 
end 
end 

def update_best_pos it ion (particle) 

return if particle [: cost] > particle [: b_cost] 
particle [:b_cost] = particle [: cost] 

particle [:b_position] = Array. new(particle[: position] ) 
end 
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def search (max_gens , search_space , vel_space, pop_size, max_vel, cl, c2) 
pop = Array . new(pop_size) {create_particle (search_space , vel_space)}- 
gbest = get_global_best (pop) 
max_gens . times do |gen| 
pop. each do Iparticlel 

update_velocity (particle , gbest, max_vel , cl, c2) 
update_position(particle , search_space) 

particle [: cost] = objective_functioii(particle [:position] ) 
update_best_po sit ion (par tide) 
end 

gbest = get_global_best (pop , gbest) 
puts " > gen #-Cgen+l]-, f itness=#-Cgbest [ : cost] }" 
end 

return gbest 
end 

if __FILE__ == $0 

# problem configuration 
pr obi em_ size = 2 

search_space = Array . new(problein_size) -[ I i I [-5, 5]} 

# algorithm configuration 

vel_space = Array . new(problem_size) { I i I [-1, 1]} 
inax_gens = 100 
pop_size = 50 
inax_vel = 100.0 
cl, c2 = 2.0, 2.0 

# execute the algorithm 

best = search(max_gens , search_space , vel_space, pop_size, max_vel, cl,c2) 
puts "done! Solution: f =#-[best [ : cost] }■ , s=#-[best [: position] . inspect} " 
end 

Listing 6.1: Particle Swarm Optimization in Ruby 



6.2.8 References 

Primary Sources 

Particle Swarm Optimization was described as a stochastic global optimiza- 
tion method for continuous functions in 1995 by Eberhart and Kennedy 
[1, 3]. This work was motivated as an optimization method loosely based 
on the flocking behavioral models of Reynolds [7] . Early works included the 
introduction of inertia [8] and early study of social topologies in the swarm 
by Kennedy [2]. 

Learn More 

Poli, Kennedy, and Blackwell provide a modern overview of the field of 
PSO with detailed coverage of extensions to the baseline technique [6]. Poli 
provides a meta-analysis of PSO publications that focus on the application 
the technique, providing a systematic breakdown on application areas [5]. 
An excellent book on Swarm hitelligence in general with detailed coverage of 
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Particle Swarm Optimization is "Swarm Intelligence" by Kennedy, Eberhart, 
and Shi [4]. 
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6.3 Ant System 

Ant System, AS, Ant Cycle. 

6.3.1 Taxonomy 

The Ant System algorithm is an example of an Ant Colony Optimization 
method from the field of Swarm hitelligence, Metahem'istics and Computa- 
tional Intelligence. Ant System was originally the term used to refer to a 
range of Ant based algorithms, where the specific algorithm implementation 
was referred to as Ant Cycle. The so-called Ant Cycle algorithm is now 
canonically referred to as Ant System. The Ant System algorithm is the 
baseline Ant Colony Optimization method for popular extensions such as 
Elite Ant System, Rank-based Ant System, Max-Min Ant System, and Ant 
Colony System. 

6.3.2 Inspiration 

The Ant system algorithm is inspired l^y the foraging behavior of ants, specif- 
ically the pheromone communication between ants regarding a good path 
between the colony and a food source in an environment. This mechanism 
is called stigmergy. 

6.3.3 Metaphor 

Ants initially wander randomly around their environment. Once food 
is located an ant will begin laying down pheromone in the environment. 
Nmnerous trips between the food and the colony are performed and if the 
same route is followed that leads to food then additional pheromone is laid 
down. Pheromone decays in the environment, so that older paths are less 
likely to be followed. Other ants may discover the same path to the food 
and in turn may follow it and also lay down pheromone. A positive feedback 
process routes more and more ants to productive paths that are in turn 
further refined through use. 

6.3.4 Strategy 

The objective of the strategy is to exploit historic and heuristic information 
to construct candidate solutions and fold the information learned fi-om 
constructing solutions into the history. Solutions are constructed one discrete 
piece at a time in a probabilistic step- wise manner. The probability of 
selecting a component is determined by the heuristic contribution of the 
component to the overall cost of the solution and the quality of solutions 
from which the component has historically known to have been included. 
History is updated proportional to the quality of candidate solutions and 
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is uniformly decreased ensuring the most recent and useful information is 
retained. 



6.3.5 Procedure 

Algorithm 6.3.1 provides a pseudocode listing of the main Ant System 
algorithm for minimizing a cost function. The pheromone update process 
is described by a single equation that combines the contributions of all 
candidate solutions with a decay coefficient to determine the new pheromone 
value, as follows: 



r,,, ^{l-p)x r,,, + J2 A- (6.2) 

k=i 

where Tij represents the pheromone for the component (graph edge) 
(i, j), p is the decay factor, m is the number of ants, and X^^^ A^^- is the 
sum of g ^ (maximizing solution cost) for those solutions that include 
component The Pseudocode listing shows this equation as an equivalent 
as a two step process of decay followed by update for simplicity. 

The probabilistic step- wise construction of solution makes use of both 
history (pheromone) and problem-specific heuristic information to incremen- 
tally construction a solution piece-by-piece. Each component can only be 
selected if it has not already been chosen (for most combinatorial problems), 
and for those components that can be selected from (given the current 
component i), their probability for selection is defined as: 

P.., ^ (6-3) 



where r]ij is the maximizing contribution to the overall score of selecting 

the component (such as for the Traveling Salesman Problem), a 

is the heuristic coefficient, Tij is the pheromone value for the component, /3 
is the history coefficient, and c is the set of usable components. 



6.3.6 Heuristics 

• The Ant Systems algorithm was designed for use with combinatorial 
problems such as the TSP, knapsack problem, quadratic assignment 
problems, graph coloring problems and many others. 

• The history coefficient (a) controls the amount of contribution history 
plays in a components probability of selection and is commonly set to 
1.0. 
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Algorithm 6.3.1: Pseudocode for Ant System. 



Input: ProblemSize, Population size ^ ^5 P: Q^, 

Output: Pbest 

1 Pbest ^ CreateHeuristicSolution(ProblemSize) ; 

2 Pbestcost ^ Cost (5'/,,); 

3 Pheromone ^ InitializePheromone (P6esicost) ; 

4 while -iStopConditionO do 

5 Candidates ^ 0; 

6 for i = 1 to m do 

7 Si ProbabilisticStepwiseConstruction(Pheromone, 
ProblemSize, a, 13); 
Sicost ^ Cost(S'i); 
if Sicost < Pbestcost then 

Pbestcost ^ Sicost] 



8 

9 
10 
11 
12 

13 
14 
15 
16 
17 
18 



Phest S,, 



end 

Candidates •(— Si; 



end 



DecayPheromone (Pheromone, p); 
foreach Si G Candidates do 
I UpdatePheromone (Pheromone, Si, 5'icosi); 
end 

19 end 

20 return Ptest; 



• The heuristic coefficient (jS) controls the amount of contribution 
problem-specific heuristic information plays in a components proba- 
bility of selection and is commonly between 2 and 5, such as 2.5. 

• The decay factor (p) controls the rate at which historic information is 
lost and is commonly set to 0.5. 

• The total number of ants (m) is commonly set to the number of 
components in the problem, such as the number of cities in the TSP. 

6.3.7 Code Listing 

Listing 6.2 provides an example of the Ant System algorithm implemented in 
the Ruby Programming Language. The algorithm is applied to the Berlin52 
instance of the Traveling Salesman Problem (TSP), taken from the TSPLIB. 
The problem seeks a permutation of the order to visit cities (called a tour) 
that minimized the total distance traveled. The optimal tour distance 
for Berlin52 instance is 7542 units. Some extensions to the algorithm 
implementation for speed improvements may consider pre-calculating a 
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distance matrix for all the cities in the problem, and pre-computing a 
probability matrix for choices dm-ing the probabilistic step- wise construction 
of tours. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] )**2. 0) .round 
end 

def cost (permutation, cities) 
distance =0 

permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

distance += euc_2d(cities [cl] , cities [c2]) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array. new(cities. size){| i| i} 
perm. each_index do |i| 

r = rand(perm. size-i) + i 

perm[r] , perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def initialise_pheromone_matrix(num_cities , naive_score) 

V = num_cities.to_f / naive_score 

return Array.new(num_cities){ I i I Array .new (num_c it ies, v)} 
end 

def calculate_choices (cities, last_city, exclude, pheromone, c_heur, c_hist) 
choices = [] 

cities . each_with_index do | coord, i| 
next if exclude. include? (i) 
prob = {:city=>i} 

prob [: history] = pheromone [last_city] [i] ** c_hist 
prob [: distance] = euc_2d(cities [last_city] , coord) 
prob [: heuristic] = (1.0/prob[: distance]) ** c_heur 
prob [: prob] = prob [: history] * prob [: heuristic] 
choices « prob 
end 

choices 
end 

def select _next_city (choices) 

sum = choices. inject (0. 0){ I sum, element I sum + element [: prob] } 
return choices [rand(choices. size)] [: city] if sum == 0.0 

V = randO 

choices. each_vith_ index do | choice, i| 

V -= (choice [: prob] /sum) 

return choice [: city] if v <= 0.0 
end 

return choices. last [: city] 
end 
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def stepwise_const (cities, phero, c_heur, c_hist) 

perm = [] 

perm « rand(cities . size) 
begin 

choices = calculate_choices (cities , perm . last , perm, phero , c_heur , c.hist) 

next_city = select_next_city (choices) 

perm « next_city 
end until perm. size == cities. size 
return perm 
end 

def decay_pheromone(pheromone, decay_f actor) 
pheromone . each do | array | 

array . each_with_index do |p, i| 

array [i] = (1.0 - decay_f actor) * p 
end 
end 
end 

def update.pheromone (pheromone, solutions) 
solutions . each do | other | 

other [: vector] . each_with_index do |x, i| 
y=(i==other[:vector] .size-1) ? other [: vector] [0] : other [: vector] [i+1] 
pheromone [x] [y] += (1.0 / other [: cost] ) 
pheromone [y] [x] += (1.0 / other [: cost] ) 
end 
end 
end 

def search(cities, max_it, num_ants, decay_f actor , c_heur, c_hist) 
best = {: vector=>ran.dom_permutation(cities)} 

best[:cost] = cost (best [: vector] , cities) 

pheromone = initialise_pheromone_matrix(cities. size, best[:cost]) 
max.it. times do |iter| 

solutions = [] 

num_ ants . times do 
candidate - {}■ 

candidate [: vector] = stepwise_const (cities, pheromone, c_heur, c_hist) 

candidate [: cost] = cost (candidate [: vector] , cities) 
best = candidate if candidate [: cost] < best [: cost] 
end 

decay .pheromone (pheromone , decay_f actor) 

update_pheromone (pheromone , solutions) 
puts " > iteration #{ (iter+1)}- , best=#{best [: cost]}" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 , 575] , [25, 185] , [345 ,750] , [945 , 685] , [845 , 655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
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[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
iiiax_it = 50 

num_ants = 30 
decay_f actor = 0.6 
c_heur = 2.5 
c_hist =1.0 

# execute the algorithm 

best = search (berlin52 , max_it, num_ants, decay_f actor , c_heur, c_hist) 
puts "Done. Best Solution: c=#-[best [ : cost] ]- , v=#-[best [: vector] . inspect}-" 
end 



Listing 6.2: Ant System in Ruby 



6.3.8 References 

Primary Sources 

The Ant System was described by Dorigo, Maniezzo, and Colorni in an 
early technical report as a class of algorithms and was applied to a number 
of standard combinatorial optimization algorithms [4]. A series of technical 
reports at this time investigated the class of algorithms called Ant System 
and the specific implementation called Ant Cycle. This effort contributed 
to Dorigo 's PhD thesis published in Italian [2]. The seminal publication 
into the investigation of Ant System (with the implementation still referred 
to as Ant Cycle) was by Dorigo in 1996 [3]. 

Learn More 

The seminal book on Ant Colony Optimization in general with a detailed 
treatment of Ant system is "Ant colony optimization" by Dorigo and Stiitzle 
[5]. An earlier book "Swarm intelligence: from natural to artificial systems" 
by Bonabeau, Dorigo, and Theraulaz also provides an introduction to Swarm 
Intelligence with a detailed treatment of Ant System [1]. 
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6.4 Ant Colony System 

Ant Colony System, ACS, Ant-Q. 

6.4.1 Taxonomy 

The Ant Colony System algorithm is an example of an Ant Colony Opti- 
mization method from the field of Swarm titelligence, Metahemristics and 
Computational Litelligence. Ant Colony System is an extension to the Ant 
System algorithm and is related to other Ant Colony Optimization methods 
such as Elite Ant System, and Rank-based Ant System. 

6.4.2 Inspiration 

The Ant Colony System algorithm is inspired by the foraging behavior of 
ants, specifically the pheromone communication between ants regarding a 
good path between the colony and a food source in an environment. This 
mechanism is called stigmergy. 

6.4.3 Metaphor 

Ants initially wander randomly around their environment. Once food 
is located an ant will begin laying down pheromone in the environment. 
Numerous trips between the food and the colony are performed and if the 
same route is followed that leads to food then additional pheromone is laid 
down. Pheromone decays in the environment, so that older paths are less 
likely to be followed. Other ants may discover the same path to the food 
and in turn may follow it and also lay down pheromone. A positive feedback 
process routes more and more ants to productive paths that are in tiurn 
further refined through use. 

6.4.4 Strategy 

The objective of the strategy is to exploit historic and heuristic information 
to construct candidate solutions and fold the information learned from 
constructing solutions into the history. Solutions are constructed one discrete 
piece at a time in a probabilistic step- wise manner. The probability of 
selecting a component is determined by the heuristic contribution of the 
component to the overall cost of the solution and the quality of solutions 
from which the component has historically known to have been included. 
History is updated proportional to the quality of the best known solution 
and is decreased proportional to the usage if discrete solution components. 
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6.4.5 Procedure 

Algorithm 6.4.1 provides a pseudocode listing of the main Ant Colony 
System algorithm for minimizing a cost fmiction. The probabilistic step- 
wise construction of solution makes use of both history (pheromone) and 
problem-specific heuristic information to incrementally construct a solution 
piece- by-piece. Each component can only be selected if it has not already 
been chosen (for most combinatorial problems), and for those components 
that can be selected from given the current component i, their probability 
for selection is defined as: 



Pi i < "-^ ^^^^ 6.4 

2^k=l ^i,k ^ 'h,k 

where 77^ ^ is the maximizing contribution to the overall score of selecting 
the component (such as ^j^g^^nce- Traveling Salesman Problem), ^ 

is the heuristic coefficient (commonly fixed at 1.0), Ti^j is the pheromone 
value for the component, a is the history coefficient, and c is the set of 
usable components. A greediness factor (gO) is used to influence when to 
use the above probabilistic component selection and when to greedily select 
the best possible component. 

A local pheromone update is performed for each solution that is con- 
structed to dissuade following solutions to use the same components in the 
same order, as follows: 

Ti,j ^ (1 - Cr) X Ti^j + (7 X (6.5) 

where Ti^j represents the pheromone for the component (graph edge) 
(i, j), a is the local pheromone factor, and rf ^ is the initial pheromone value. 

At the end of each iteration, the pheromone is updated and decayed 
using the best candidate solution found thus far (or the best candidate 
solution found for the iteration), as follows: 



Ti^j ^ (1 - /)) X Ti^j + p X Ar^, j (6.6) 

where Ti^j represents the pheromone for the component (graph edge) 
p is the decay factor, and Ari,j is the maximizing solution cost for 
the best solution found so far if the component ij is used in the globally 
best known solution, otherwise it is 0. 



6.4.6 Heuristics 

• The Ant Colony System algorithm was designed for use with com- 
binatorial problems such as the TSP, knapsack problem, quadratic 
assignment problems, graph coloring problems and many others. 
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Algorithm 6.4.1: Pseudocode for Ant Colony System. 
Input: ProblemSize, Population size ^ P-, l^-, 

Output: Phest 

1 Phest ^ CreateHeuristicSolution(ProblemSize) ; 

2 Pbestcost ^ Cost (Sh); 

3 Pheromoneinit ^ 



1.0 



ProblemSizeX Pbestcost ' 

4 Pheromone Init ial izePheromone (P/ieromonCi^ii) ; 

5 while -iStopConditionO do 

6 for i = 1 to m do 

7 Si -(r- ConstructSolution(Pheromone, ProblemSize, j3, gO); 

8 Sicost ^ Cost(5i); 

9 if Sicost < Pbestcost then 

10 Pbestcost ^ Sicost; 

11 Pbest ^ Si] 

12 end 

13 LocalUpdateAndDecayPheromone (Pheromone, Si, Sicost, o"); 

14 end 

15 GlobalUpdateAndDecayPheromone (Pheromone, Pbest, Pbestcost, 

16 end 

17 return Pbest^ 



• The local pheromone (history) coefficient (cr) controls the amount of 
contribution history plays in a components probability of selection 
and is commonly set to 0.1. 

• The heuristic coefficient (/3) controls the amount of contribution 
problem-specific heuristic information plays in a components proba- 
bility of selection and is commonly between 2 and 5, such as 2.5. 

• The decay factor (p) controls the rate at which historic information is 
lost and is commonly set to 0.1. 

• The greediness factor (qO) is commonly set to 0.9. 

• The total number of ants (m) is commonly set low, such as 10. 

6.4.7 Code Listing 

Listing 6.3 provides an example of the Ant Colony System algorithm imple- 
mented in the Ruby Programming Language. The algorithm is applied to 
the Berlin52 instance of the Traveling Salesman Problem (TSP), taken from 
the TSPLIB. The problem seeks a permutation of the order to visit cities 
(called a tour) that minimized the total distance traveled. The optimal tour 
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distance for Berlin52 instance is 7542 units. Some extensions to the algo- 
rithm implementation for speed improvements may consider pre-calculating 
a distance matrix for all the cities in the problem, and pre-computing a 
probability matrix for choices dm-ing the probabilistic step- wise construction 
of tours. 



def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0+ (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (permutation, cities) 
distance =0 

permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

distance += euc_2d(cities [cl] , cities [c2]) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array .new(cities . size){ | i | i} 
perm. each_index do |i| 

r = rand (perm. size-i) + i 

perm[r], perm[i] = perm[i], perm[r] 
end 

return perm 
end 

def initialise_pheromone_matrix (num_cities , init_pher) 

return Array . new(num_cities) { I i I Array . new(num_cities , init_pher)} 
end 

def calculate_choices (cities, last_city, exclude, pheromone, c_heur, c_hist) 
choices = [] 

cities. each. vith_index do I coord, i| 
next if exclude . include? (i) 

prob = ■C:city=>i} 

prob[: history] = pheromone [last_city] [i] ** c_hist 

prob [: distance] = euc_2d(cities [last_city] , coord) 
prob [: heuristic] = (1 . 0/prob [: distance] ) ** c_heur 
prob [: prob] = prob [: history] * prob [: heuristic] 
choices « prob 
end 

return choices 
end 

def prob_select (choices) 

sum = choices . inject (0 . 0)-[ I sum, element I sum + element [: prob] } 
return choices [rand(choices. size)] [: city] if sum == 0.0 
V = reindO 

choices . each_with_index do | choice, i| 

V -= (choice [: prob] /sum) 

return choice [: city] if v <= 0.0 
end 

return choices. last [: city] 
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end 

def greedy_select (choices) 

return choices.max-[ I a,b| a[ :prob] <=>b [: prob] ]■ [: city] 
end 

def stepwise_const (cities, phero, c_heur, c_greed) 
perm = [] 

perm << rand(cities.size) 
begin 

choices = calculate_choices (cities, perm. last, perm, phero, c_heur, 1.0) 
greedy = rand() <= c_greed 

next_city = (greedy) ? greedy.select (choices) : prob.select (choices) 

perm << next.city 
end until perm. size == cities. size 
return perm 
end 

def global_update_pheromone (phero , cand, decay) 
cand [: vector] .each_with_index do |x, i| 

y = (i==cand[ : vector] . size-1) ? cand [: vector] [0] : cand [: vector] [i+1] 
value = ( (1 . 0-decay) *phero [x] [y] ) + (decay* (1 . 0/caiid[ : cost] ) ) 
phero [x] [y] = value 
phero [y] [x] = value 
end 
end 

def local_update_pheromone (pheromone , cand, c_local_phero, init_phero) 
cand [: vector] . each_with_index do |x, i| 

y = (i==cand[: vector] .size-1) ? cand [: vector] [0] : cand [: vector] [i+1] 
value = ( (1 . 0-c_local_phero)*pheromone [x] [y] )+(c_local_phero*init_phero) 
pheromone [x] [y] = value 
pheromone [y] [x] = value 
end 
end 

def search(cities , max_it, num_ants, decay, c_heur, c_local_phero, c_greed) 
best = •[ : vector=>random_permutation(cities)} 
best[:cost] = cost (best [: vector] , cities) 
init_pheromone = 1.0 / (cities . size .to_f * best [: cost]) 
pheromone = initialise_pheromone_matrix(cities.size, init .pheromone) 
max_it .times do | iter | 
solutions = [] 
num_ ants . times do 
cand = {} 

cand[: vector] = stepwise_const (cities , pheromone, c_heur, c_greed) 

cand[:cost] = cost(cand[: vector] , cities) 
best = cand if cand [: cost] < best [: cost] 

local_update_pheromone (pheromone, cand, c_local_phero, init .pheromone) 
end 

global_update_pheromone (pheromone, best, decay) 
puts " > iteration #{(iter+l)}, best=#{best [: cost] >" 
end 

return best 
end 
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106 if FILE == $0 

107 # problem configuration 

108 berlin52 = [ [565 , 575] , [25 , 185] , [345 , 750] , [945 , 685] , [845 , 655] , 

109 [880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 

110 [1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 

111 [415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 

112 [835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 

113 [410,250] , [420,555] , [575,665] , [1150, 1160] , [700,580] , [685,595] , 

114 [685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 

115 [95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 

116 [830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

117 # algorithm configuration 

118 max_it = 100 

119 num_ants = 10 

120 decay = 0.1 

121 c_heur = 2.5 

122 c_local_phero = 0.1 

123 c_greed = 0.9 

124 # execute the algorithm 

125 best = search(berlin52, max_it, num_aiits, decay, c_heur, c_local_phero , 

c_greed) 

126 puts "Done. Best Solution: c=#-Cbest [ : cost] ]- , v=#-Cbest [: vector] . inspect}" 

127 end 

Listing 6.3: Ant Colony System in Ruby 



6.4.8 References 

Primary Sources 

The algorithm was initially investigated by Dorigo and Gambardella mider 
the name Ant-Q [2, 6]. It was renamed Ant Colony System and further 
investigated first in a technical report by Dorigo and Gambardella [4], and 
later published [3]. 

Learn More 

The seminal book on Ant Colony Optimization in general with a detailed 
treatment of Ant Colony System is "Ant colony optimization" by Dorigo and 
Stiitzle [5]. An earlier book "Swarm intelligence: from natural to artificial 
systems" by Bonabeau, Dorigo, and Theraulaz also provides an introduction 
to Swarm hitelligence with a detailed treatment of Ant Colony System [1]. 
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6.5 Bees Algorithm 

Bees Algorithm, BA. 

6.5.1 Taxonomy 

The Bees Algorithm beings to Bee Inspired Algorithms and the field of 
Swarm Intelligence, and more broadly the fields of Computational Intel- 
ligence and Metaheuristics. The Bees Algorithm is related to other Bee 
Inspired Algorithms, such as Bee Colony Optimization, and other Swarm 
Intelligence algorithms such as Ant Colony Optimization and Particle Swarm 
Optimization. 

6.5.2 Inspiration 

The Bees Algorithm is inspired by the foraging behavior of honey bees. 
Honey bees collect nectar from vast areas around their hive (more than 
10 kilometers). Bee Colonies have been observed to send bees to collect 
nectar from flower patches relative to the amount of food available at each 
patch. Bees communicate with each other at the hive via a waggle dance 
that informs other bees in the hive as to the direction, distance, and quality 
rating of food sources. 

6.5.3 Metaphor 

Honey bees collect nectar from flower patches as a food source for the 
hive. The hive sends out scout's that locate patches of flowers, who then 
return to the hive and inform other bees about the fitness and location of 
a food source via a waggle dance. The scout returns to the flower patch 
with follower bees. A small number of scouts continue to search for new 
patches, while bees returning from flower patches continue to communicate 
the quality of the patch. 

6.5.4 Strategy 

The information processing objective of the algorithm is to locate and 
explore good sites within a problem search space. Scouts are sent out to 
randomly sample the problem space and locate good sites. The good sites 
are exploited via the application of a local search, where a small number of 
good sites are explored more than the others. Good sites are continually 
exploited, although many scouts are sent out each iteration always in search 
of additional good sites. 
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6.5.5 Procedure 

Algorithm 6.5.1 provides a pseudocode listing of the Bees Algorithm for 
minimizing a cost fmiction. 



Algorithm 6.5.1: Pseudocode for the Bees Algorithm. 



OtherBeeSr 



num.! P'^obleUlgii^g') , 



Input: 

PatchSizeinit, EliteBeeSnum, 
Output: Becbest 

1 Population ^ InitializePopulation(5ee5 

2 while -iStopConditionO do 
EvaluatePopulation(Population) ; 
Beeiest ^ GetBestSolution(Population) ; 
NextGeneration ^ 0; 

Patchgize ^ ( PatchSizeinit x PatchDecrease factor)', 
Sites^Qst SelectBestSites (Population, SiteSnum^'i 
foreach Sitci € Sitesbest do 
RecruitedBeeSnum ^ 0; 
if ^ < Elite Sites num. then 

I RecruitedBeeSnum EliteBeeSr 
else 

I RecruitedBeeSnum ^ OtherBees., 
end 

Neighborhood ^ 0; 
for j to RecruitedBeeSnum do 

Neighborhood ^ CreateNeighborhoodBee 
P atchgize') J 
end 

NextGeneration ^ GetBestSolution(Neighborhood) ; 
end 

Remaining Bees num ^ {BeeSnum- SiteSnum)] 
for j to Remaining BeeSnum do 

I NextGeneration ^ CreateRandomBee () ; 
end 

Population NextGeneration; 

26 end 

27 return Becbest] 



^num 1 



6.5.6 Heuristics 

• The Bees Algorithm was developed to be used with continuous and 
combinatorial function optimization problems. 

• The Patchsize variable is used as the neighborhood size. For example, 
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in a continuous function optimization problem, each dimension of a 
site would be sampled as Xi ± {randQ x Patchsize)- 

• The Patchsize variable is decreased each iteration, typically by a 
constant amount (such as 0.95). 

• The number of elite sites (EliteSiteSnum) must be < the number 
of sites {SiteSnum)i and the number of elite bees (EliteBeeSnum) is 
traditionally < the number of other bees {Other BeeSnum)- 

6.5.7 Code Listing 

Listing 6.4 provides an example of the Bees Algorithm implemented in the 
Ruby Programming Language. The demonstration problem is an instance of 
a continuous function optimization that seeks min/(a:) where / = X^"^^ » 
—5.0 < Xi < 5.0 and n = 3. The optimal solution for this basin function 
is (vq, . . . ,Vn-i) = 0.0. The algorithm is an implementation of the Bees 
Algorithm as described in the seminal paper [2]. A fixed patch size decrease 
factor of 0.95 was applied each iteration. 

def objective_function(vector) 

return vector . inject (0 . 0) -[Isum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 

return Array . new(minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def create_random_bee (search_space) 

return { : vector=>random_vector (search_space) } 
end 

def create_neigh_bee (site , patch_size, search_space) 
vector = [] 

site . each_with_index do |v,i| 

V = (rand()<0.5) ? v+rand() *patch_size : v-rand() *patch_size 

V = search_space [i] [0] if v < search_space [i] [0] 

V = search_space [i] [1] if v > search_space [i] [1] 
vector << v 

end 

bee = {> 

bee [: vector] = vector 
return bee 
end 

def search_neigh (parent , neigh_size, patch_size, search_space) 
neigh = [] 
neigh_size . times do 

neigh << create_neigh_bee (parent [: vector] , patch_size, search_space) 
end 

neigh. each-C I bee I bee [: fitness] = objective_function(bee [: vector] ) }■ 
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return neigh. sort{|x,y| x[:fitness]<=>y[: fitness]}. first 
end 

def create_scout_bees(search_space, nuiii_scouts) 

return Array. new (num_scouts) do 
create_random_bee (search_space} 

end 
end 

def search (inax_gens, search_space, num_bees, num_sites, elite_sites, 
patch_size, e_bees, o_bees) 
best = nil 

pop = Array. new (num_bees){ create_randoiii_bee(search_space) )■ 
max_gens . times do |gen| 

pop. each{ I bee I bee [: fitness] = objective_function(bee [: vector] )} 

pop . sort 1 { I X , y I x [ : f itness] <=>y [ : f itness] } 

best = pop. first if best. nil? or pop.f irst [: fitness] < best [: fitness] 

next_gen = [] 

pop [0 . . .num_sites] . each_with_index do | parent, i| 
neigh_size = (i<elite_sites) ? e_bees : o_bees 

next.gen « search.neigh (parent , neigh_size, patch_size, search.space) 
end 

scouts = create_scout_bees(search_space, (num.bees-num.sites) ) 
pop = nezt.gen + scouts 

patch_size = patch_size * 0.95 

puts " > it=#-Cgen+l} , patch_size=#-Cpatch_size}- , f =#{best [: fitness] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 3 

search.space = Array. new(probleiii_size) {|i| [-5, 5]} 

# algorithm configuration 

max_gens = 500 
num_bees = 45 
num_sites = 3 
elite_sites = 1 
patch_size = 3.0 
e_bees = 7 
o_bees = 2 

# execute the algorithm 

best = seELrch(max_gens, search_space, num_bees, nuin_sites, elite_sites, 

patch.size, e.bees, o.bees) 
puts "done! Solution: f=#{best [: fitness]}, s=#{best [: vector] . inspect}" 
end 
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6.5.8 References 

Primary Sources 

The Bees Algorithm was proposed by Pham et al. in a technical report in 
2005 [3], and later published [2]. In this work, the algorithm was applied to 
standard instances of continuous function optimization problems. 

Learn More 

The majority of the work on the algorithm has concerned its application to 
various problem domains. The following is a selection of popular application 
papers: the optimization of linear antenna arrays by Guney and Onay [1], 
the optimization of codebook vectors in the Learning Vector Quantization 
algorithm for classification by Pham et al. [5], optimization of neural net- 
works for classification by Pham et al. [6], and the optimization of clustering 
methods by Pham et al. [4]. 
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6.6 Bacterial Foraging Optimization Algorithm 

Bacterial Foraging Optimization Algorithm, BFOA, Bacterial Foraging 
Optimization, BFO. 

6.6.1 Taxonomy 

The Bacterial Foraging Optimization Algorithm belongs to the field of Bac- 
teria Optimization Algorithms and Swarm Optimization, and more broadly 
to the fields of Computational Intelligence and Metaheuristics. It is related 
to other Bacteria Optimization Algorithms such as the Bacteria Chemotaxis 
Algorithm [3], and other Swarm Intelligence algorithms such as Ant Colony 
Optimization and Particle Swarm Optimization. There have been many 
extensions of the approach that attempt to hybridize the algorithm with 
other Computational Intelligence algorithms and Metaheuristics such as 
Particle Swarm Optimization, Genetic Algorithm, and Tabu Search. 

6.6.2 Inspiration 

The Bacterial Foraging Optimization Algorithm is inspired by the group 
foraging behavior of bacteria such as E.coli and M.xanthus. Specifically, the 
BFOA is inspired by the chemotaxis behavior of bacteria that will perceive 
chemical gradients in the environment (such as nutrients) and move toward 
or away from specific signals. 

6.6.3 Metaphor 

Bacteria perceive the direction to food based on the gradients of chemicals 
in their environment. Similarly, bacteria secrete attracting and repelling 
chemicals into the environment and can perceive each other in a similar 
way. Using locomotion mechanisms (such as fiagella) bacteria can move 
around in their environment, sometimes moving chaotically (tumbling and 
spinning), and other times moving in a directed manner that may be referred 
to as swimming. Bacterial cells are treated like agents in an environment, 
using their perception of food and other cells as motivation to move, and 
stochastic tumbling and swimming like movement to re-locate. Depending" 
on the cell-cell interactions, cells may swarm a food source, and/or may 
aggressively repel or ignore each other. 

6.6.4 Strategy 

The information processing strategy of the algorithm is to allow cells to 
stochastically and collectively swarm toward optima. This is achieved 
through a series of three processes on a population of simulated cells: 1) 
'Chemotaxis' where the cost of cells is derated by the proximity to other 
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cells and cells move along the manipulated cost surface one at a time (the 
majority of the work of the algorithm), 2) 'Reproduction' where only those 
cells that performed well over their lifetime may contribute to the next 
generation, and 3) 'Elimination-dispersal' where cells are discarded and new 
random samples are inserted with a low probability. 



6.6.5 Procedure 

Algorithm 6.6.1 provides a pseudocode listing of the Bacterial Foraging 
Optimization Algorithm for minimizing a cost function. Algorithm 6.6.2 
provides the pseudocode listing for the chemotaxis and swing behaviour 
of the BFOA algorithm. A bacteria cost is derated by its interaction with 
other cells. This interaction function {g{)) is calculated as follows: 



^ r / P . \ 

g{cellk) = ^ - dattr X expi - Wattr X ^ (ce//^ - otherl^)^ j 

i=l L ^ m=l ^ 

^ r / P . \ 

hrepel X expi - Wrepel X ^ cell'^ - other]^f j 

i=l ^ ^ m=l ^ 

where cellk is a given cell, dattr and Wattr are attraction coefficients, 
hj-epei and w^Qp^i are repulsion coefficients, S is the number of cells in the 
population, P is the number of dimensions on a given cells position vector. 

The remaining parameters of the algorithm are as follows Cellsnum is 
the number of cells maintained in the population, Ned is the number of 
elimination-dispersal steps, A^^e is the number of reproduction steps, Nc 
is the number of chemotaxis steps, A^^ is the number of swim steps for a 
given cell, Stepsize is a random direction vector with the same number of 
dimensions as the problem space, and each value G [—1, 1], and Ped is the 
probability of a cell being subjected to elimination and dispersal. 



6.6.6 Heuristics 

• The algorithm was designed for application to continuous function 
optimization problem domains. 

• Given the loops in the algorithm, it can be configured numerous ways 
to elicit different search behavior. It is common to have a large number 
of chemotaxis iterations, and small numbers of the other iterations. 

• The default coefficients for swarming behavior (cell-cell interactions) 

are as follows dattr act = 0.1, W attract = 0.2, hrepellant = dattr act, and 
'^repellant — 10. 

• The step size is commonly a small fraction of the search space, such 
as 0.1. 
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Algorithm 6.6.1: Pseudocode for the BFOA. 



Stepi 



size: d-attracti 



Input: Problerusize, CellSnum, Ned, ^re, Ng, 
'^attract) ^repeZ/ant? '^repellant: Ped 

Output: Celliest 

1 Population ^ InitializePopulation(Ce//s„^i^, Problem size.)] 

2 for / = 0 to Ned do 
for A; = 0 to do 

for J = 0 to Nc do 

Chemot cLxisAndSwiin ( Popu lation J Pvohlemigize) ^^H^num; 

Ng, Stcpgize: ^attract} '^attract j hj-epellant) ^repellant) i 

foreach Cell G Population do 

if Cost (Cell) < Cost(Cellbegt) then 

I Cellbest ^ Cell; 
end 
end 
end 

SortByCellHealth(Population); 

Selected ^ SelectByCellHealth(Population, ) ; 

Population Selected; 
Population ^ Selected; 
end 

foreach Cell G Population do 
if RandO < Ped then 

I Cell CreateCellAtRandomLocationO ; 
end 
end 

22 end 

23 return Cellhest- 



• During reproduction, typically half the population with a low health 
metric are discarded, and two copies of each member from the first 
(high-health) half of the population are retained. 

• The probability of elimination and dispersal (ped) is commonly set 
quite large, such as 0.25. 

6.6.7 Code Listing 

Listing 6.5 provides an example of the Bacterial Foraging Optimization 
Algorithm implemented in the Ruby Programming Language. The demon- 
stration problem is an instance of a continuous function optimization that 
seeks min/(a;) where / = Yl^=i ~5.0 < Xi < 5.0 and n — 2. The opti- 
mal solution for this basin function is {vq, . . . ,Vn-i) = 0.0. The algorithm 
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Algorithm 6.6.2: Pseudocode for the ChemotaxisAndSwim function. 



Input: Population, Problemsize: ^G^lsnum: 

Wattractt h^Qpgllant: '^repellant 

1 foreach Cell G Population do 



Steps 



size'! ^attract 1 



Cell fitness ^ Cost (Cell) + InteractionCCell, Population, 

dattracty '^attract) ^repellantj ^repellant^ i 
Cellhealth ^ Cellf 

itness i 

Cell' ^ 0; 

for i = 0 to Ng do 

RandomStepDirection CreateStepCProfe/em^i^e) ! 
Cell' ^ TakeStepCR andomStepDirection, Stepsize)] 
Cell' fitness ^ Cost(Ce//') + Interaction(Ce//', Population. 

dattract^ ^attract) hrepellant} '^repellant^ i 

if Cell' fitness > Cell fitness then 



Si 



else 



Cell ^ Cell'; 

Cellhealth ^ Cellhealth + Cell' fitn 



ess ) 



end 
end 



16 end 



is an implementation based on the description on the seminal work [4]. 
The parameters for cell-cell interactions (attraction and repulsion) were 
taken from the paper, and the various loop parameters were taken from the 
'Swarming Effects' example. 

def objective_f unction(vector) 

return vector . inject (0 . 0) -[|sum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 

return Array . new(minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def generate_random_direction(problem_size) 

bounds = Array . new(problem_size) { [-1 . 0 , 1 . 0] }■ 

return random_vector (bounds) 
end 

def compute_cell_interaction(cell , cells, d, w) 
sum = 0.0 

cells. each do | other | 
diff =0.0 

cell [: vector] . each_index do |i| 
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diff += (cell [: vector] [i] - other [: vector] [i] )**2.0 
end 

sum += d * Math.expCw * diff) 
end 

return sum 
end 

def attract_repel (cell , cells, d_attr, w_attr, h_rep, w_rep) 

attract = compute_cell_interaction(cell , cells, -d_attr, -w_attr) 
repel = compute_cell_interaction(cell, cells, h_rep, -w_rep) 
return attract + repel 

end 

def evaluate (cell , cells, d_attr, w_attr, h_rep, w_rep) 
cell[:cost] = objective_function(cell [: vector] ) 

cell [: inter] = attract_repel(cell, cells, d_attr, w_attr, h_rep, w_rep) 
cell [: fitness] = cell [: cost] + cell [: inter] 
end 

def tumble_cell(search_space, cell, step_size) 

step = generate_randoin_direction(search_space. size) 
vector = Array. new (s8arch_space. size) 
vector . each_index do I i I 

vector[i] = cell [: vector] [i] + step_size * step[i] 
vector [i] = search_space [i] [0] if vector [i] < search_space [i] [0] 
vector [i] = search_space [i] [1] if vector [i] > searcli_space [i] [1] 
end 

return {:vector=>vector} 
end 

def chemotaxis (cells, search.space, chem_steps, swim.length, step_size, 
d_attr, w_attr, h_rep, w_rep) 
best = nil 

chem_steps. times do |j| 
moved_cells = [] 

cells . each_with_index do Icell, i| 
sum_nutrients = 0.0 

evaluate (cell , cells, d_attr, w_attr, h_rep, w_rep) 
best = cell if best. nil? or cell[:cost] < best[:cost] 
sum_nutrients += cell [: fitness] 
swim.length. times do I ml 

new_cell = tumble.cell (search. space, cell, step.size) 

evaluate (new_cell , cells, d_attr, w_attr, h_rep, w_rep) 

best = cell if cell [: cost] < best [: cost] 

break if new_cell[:f itness] > cell [: fitness] 

cell = new_cell 

sum_nutrients += cell [: fitness] 
end 

cell [: sum.nutrients] = sum.nutrients 
moved_cells « cell 
end 

puts " » chemo=#{j]-, f=#{best [: fitness]}, cost=#-[best [: cost]}" 
cells = moved_cells 
end 

return [best, cells] 
end 
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def search(search_space , pop_size, elim_disp_steps , repro_steps, 

chem_steps, swim_lengtli, step_size, d_attr, w_attr, h_rep, w_rep, 
p_eliininate) 

cells = Array .new(pop_size) •[ {:vector=>random_ vector (search_space)> > 

best = nil 

elim_disp_steps . times do |1| 
repro_steps. times do |k| 

c_best, cells = chemoteixis (cells, search_space, chem.steps , 

swim_length, step_size, d_attr, w_attr, h_rep, w_rep) 
best = c_best if best. nil? or c_best [ : cost] < best [: cost] 
puts " > best fitness=#-[best [; fitness] >, cost=#{best [:cost]}" 
cells . sort{ I x, y | x [ : sum_nutrients] <=>y [ : sum_nutrients]} 
cells = cells . first (pop_size/2) + cells. first(pop_size/2) 
end 

cells. each do I cell I 

if randO <= p_eliminate 

cell [: vector] = random. vector (search_space) 
end 
end 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array . new (problem_size) {|i| [-5, 5]} 

# algorithm configuration 
pop_size = 50 
step_size =0.1 # Ci 
elim_disp_steps = 1 # Ned 
repro_steps = 4 # Nre 
chem_ steps =70 # Nc 
swim_length = 4 # 
p_eliminate = 0.25 # Ped 
d_attr =0.1 

w_attr =0.2 
h_rep = d_attr 
w_rep = 10 

# execute the algorithm 

best = se8Lrch(search_space, pop_size, elim_disp_steps, repro.steps, 

chem_steps, swim_length, step_size, d_attr, w_attr, h_rep, w_rep, 
p_eliminate) 

puts "done! Solution: c=#{best [: cost] }, v=#{best [: vector] . inspect}" 
end 



Listing 6.5: Bacterial Foraging Optimization Algorithm in Ruby 

6.6.8 References 

Primary Sources 

Early work by Liu and Passino considered models of chemotaxis as opti- 
mization for both E.coli and M.xanthus which were applied to continuous 
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function optimization [2]. This work was consolidated by Passino who 
presented the Bacterial Foraging Optimization Algorithm that included 
a detailed presentation of the algorithm, heuristics for configuration, and 
demonstration applications and behavior dynamics [4]. 

Learn More 

A detailed summary of social foraging and the BFOA is provided in the 
book by Passino [5]. Passino provides a follow-up review of the background 
models of chemotaxis as optimization and describes the equations of the 
Bacterial Foraging Optimization Algorithm in detail in a Journal article [6]. 
Das et al. present the algorithm and its inspiration, and go on to provide an 
in depth analysis the dynamics of chemotaxis using simplified mathematical 
models [1]. 
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Chapter 7 

Immune Algorithms 



7.1 Overview 

This chapter describes Immune Algorithms. 
7.1.1 Immune System 

Immmie Algorithms belong to the Artificial hnmune Systems field of study 
concerned with computational methods inspired by the process and mecha- 
nisms of the biological immune system. 

A simplified description of the immune system is an organ system 
intended to protect the host organism from the threats posed to it from 
pathogens and toxic substances. Pathogens encompass a range of micro- 
organisms such as bacteria, viruses, parasites and pollen. The traditional 
perspective regarding the role of the immune system is divided into two 
primary tasks: the detection and elimination of pathogen. This behavior 
is typically referred to as the differentiation of self (molecules and cells 
that belong to the host organisms) from potentially harmful non-self. More 
recent perspectives on the role of the system include a maintenance system 
[3], and a cognitive system [22]. 

The architecture of the immune system is such that a series of defensive 
layers protect the host. Once a pathogen makes it inside the host, it must 
contend with the innate and acquired immune system. These interrelated im- 
munological sub-systems are comprised of many types of cells and molecules 
produced by specialized organs and processes to address the self-nonself 
problem at the lowest level using chemical bonding, where the surfaces of 
cells and molecules interact with the surfaces of pathogen. 

The adaptive immune system, also referred to as the acquired immune 
system, is named such because it is responsible for specializing a defense 
for the host organism based on the specific pathogen to which it is exposed. 
Unlike the innate immune system, the acquired immune system is present 
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only in vertebrates (animals with a spinal column). The system retains a 
memory of exposures which it has encountered. This memory is recalled 
on reinfection exhibiting a learned pathogen identification. This learning 
process may be divided into two types of response. The first or primary 
response occurs when the system encounters a novel pathogen. The system is 
slow to respond, potentially taking a number of weeks to clear the infection. 
On re-encountering the same pathogen again, the system exhibits a secondary 
response, applying what was learned in the primary response and clearing 
up the infection rapidly. The memory the system acquires in the primary 
response is typically long lasting, providing pathogenic immunity for the 
lifetime of the host, two common examples of which are the chickenpox and 
measles. White blood cells called lymphocytes (or leukocytes) are the most 
important cell in the acquired immune system. Lymphocytes are involved in 
both the identification and elimination of pathogen, and recirculate within 
the host organisms body in the blood and lymph (the fluid that permeates 
tissue) . 

7.1.2 Artificial Immune Systems 

Artificial Immune Systems (AIS) is a sub-field of Computational Intelli- 
gence motivated by immunology (primarily mammalian immunology) that 
emerged in the early 1990s (for example [1, 15]), based on the proposal in the 
late 1980s to apply theoretical immunological models to machine learning 
and automated problem solving (such as [9, 12]). The early works in the 
field were inspired by exotic theoretical models (immune network theory) 
and were applied to machine learning, control and optimization problems. 
The approaches were reminiscent of paradigms such as Artificial Neural 
Networks, Genetic Algorithms, Reinforcement Learning, and Learning Clas- 
sifier Systems. The most formative works in giving the field an identity 
were those that proposed the immune system as an analogy for information 
protection systems in the field of computer security. The classical examples 
include Forrest et al.'s Computer Immunity [10, 11] and Kephart's Immune 
Anti- Virus [17, 18]. These works were formative for the field because they 
provided an intuitive application domain that captivated a broader audience 
and assisted in differentiating the work as an independent sub-field. 

Modern Artificial Immune systems are inspired by one of three sub- 
fields: clonal selection, negative selection and immune network algorithms. 
The techniques are commonly used for clustering, pattern recognition, 
classification, optimization, and other similar machine learning problem 
domains. 

The seminal reference for those interested in the field is the text book by 
de Castro and Timmis Artificial Immune Systems: A New Computational 
Intelligence Approach^'' [8]. This reference text provides an introduction 
to immunology with a level of detail appropriate for a computer scientist, 
followed by a summary of the state of the art, algorithms, application areas. 
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and case studies. 

7.1.3 Extensions 

There are many other algorithms and classes of algorithm that were not 
described from the field of Artificial Immune Systems, not limited to: 

• Clonal Selection Algorithms: such as the B-Cell Algorithm [16], 
the Multi-objective Immune System Algorithm (MSIRA) [2, 4] and the 
the Optimization Immune Algorithm (opt-IA, opt-IMMALG) [5, 6] 
and the Simple Immunological Algorithm [7]. 

• Immune Network Algorithms: such as the approach by Timmis 
used for clustering called the Artificial Immune Network (AIN) [20] 
(later extended and renamed the Resource Limited Artificial Immune 
System [19, 21]. 

• Negative Selection Algorithms: such as an adaptive framework 
called the ARTificial Immune System (ARTIS), with the application 
to intrusion detection renamed the Lightweight Intrusion Detection 
System (LISYS) [13, 14]. 
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7.2 Clonal Selection Algorithm 

Clonal Selection Algorithm, CSA, CLONALG. 

7.2.1 Taxonomy 

The Clonal Selection Algorithm (CLONALG) belongs to the field of Artifi- 
cial Immune Systems. It is related to other Clonal Selection Algorithms such 
as the Artificial Immune Recognition System (Section 7.4), the B-Cell Algo- 
rithm (BCA), and the Multi-objective Immune System Algorithm (MISA). 
There are numerious extensions to CLONALG including tweaks such as the 
CLONALG 1 and CL0NALG2 approaches, a version for classification called 
CLONCLAS, and an adaptive version called Adaptive Clonal Selection 
(ACS). 

7.2.2 Inspiration 

The Clonal Selection algorithm is inspired by the Clonal Selection theory 
of acquired immunity. The clonal selection theory credited to Burnet was 
proposed to account for the behavior and capabilities of antibodies in the 
acquired immune system [2, 3]. Inspired itself by the principles of Darwinian 
natural selection theory of evolution, the theory proposes that antigens 
select- for lymphocytes (both B and T-cells). When a lymphocyte is selected 
and binds to an antigenic determinant, the cell proliferates making many 
thousands more copies of itself and differentiates into different cell types 
(plasma and memory cells). Plasma cells have a short lifespan and produce 
vast quantities of antibody molecules, whereas memory cells live for an 
extended period in the host anticipating future recognition of the same 
determinant. The important feature of the theory is that when a cell is 
selected and proliferates, it is subjected to small copying errors (changes 
to the genome called somatic hypermutation) that change the shape of the 
expressed receptors and subsequent determinant recognition capabilities 
of both the antibodies bound to the lymphocytes cells surface, and the 
antibodies that plasma cells produce. 

7.2.3 Metaphor 

The theory suggests that starting with an initial repertoire of general immune 
cells, the system is able to change itself (the compositions and densities of 
cells and their receptors) in response to experience with the environment. 
Through a blind process of selection and accumulated variation on the large 
scale of many billions of cells, the acquired immune system is capable of 
acquiring the necessary information to protect the host organism from the 
specific pathogenic dangers of the environment. It also suggests that the 
system must anticipate (guess) at the pathogen to which it will be exposed, 
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and requires exposure to pathogen that may harm the host before it can 
acquire the necessary information to provide a defense. 

7.2.4 Strategy 

The information processing principles of the clonal selection theory describe 
a general learning strategy. This strategy involves a population of adaptive 
information units (each representing a problem-solution or component) 
subjected to a competitive processes for selection, which together with the 
resultant duplication and variation ultimately improves the adaptive fit of 
the information units to their environment. 

7.2.5 Procedure 

Algorithm 7.2.1 provides a pseudocode listing of the Clonal Selection Algo- 
rithm (CLONALG) for minimizing a cost function. The general CLONALG 
model involves the selection of antibodies (candidate solutions) based on 
affinity either by matching against an antigen pattern or via evaluation of a 
pattern by a cost function. Selected antibodies are subjected to cloning pro- 
portional to affinity, and the hypermutation of clones inversely-proportional 
to clone affinity. The resultant clonal-set competes with the existent an- 
tibody population for membership in the next generation. In addition, 
low-affinity population members are replaced by randomly generated an- 
tibodies. The pattern recognition variation of the algorithm includes the 
maintenance of a memory solution set which in its entirety represents a 
solution to the problem. A binary-encoding scheme is employed for the 
binary-pattern recognition and continuous function optimization examples, 
and an integer permutation scheme is employed for the Traveling Salesman 
Problem (TSP). 

7.2.6 Heuristics 

• The CLONALG was designed as a general machine learning approach 
and has been applied to pattern recognition, function optimization, 
and combinatorial optimization problem domains. 

• Binary string representations are used and decoded to a representation 
suitable for a specific problem domain. 

• The number of clones created for each selected member is calculated 
as a function of the repertoire size Nc = round{(5 ■ N), where /3 is the 
user parameter ClonCrate- 

• A rank-based affinity-proportionate function is used to determine the 
number of clones created for selected members of the population for 
pattern recognition problem instances. 
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Algorithm 7.2.1: Pseudocode for CLONALG. 



Input: Population size ■, Selection size ■, Prohlemsizei 
RandomCellsnum, Clonerate, Mutationrate 
Output: Population 

1 Population Crea.teRa.ndomCells (Populationgize, Problem size)'-, 

2 while -iStopConditionO do 
foreach pi G Population do 

I Aff inity(;?i); 
end 

Population select ^ Select (Population, Selection size)] 
Populationdones ^ 0; 
foreach pi G Population select do 

I Populationdones ^ Clone (pi, Clone^ate) ] 
end 

foreach pi G Populationdones 

do 

Hypermut at e (pi , Mutationrate ) ] 
Af f inity (pi) ; 
end 

Population -(— Select (Population, Populationdones , 
Populationsize) 'i 

Populationrand ^ CreeLteRBindomCells (RandomCellsnum) 'i 
Replace (Population, Populationrand) ] 

18 end 

19 return Population; 



• The number of random antibodies inserted each iteration is typically 
very low (1-2). 

• Point mutations (bit-flips) are used in the hypermutation operation. 

• The function exp{—p ■ f) is used to determine the probability of 
individual component mutation for a given candidate solution, where 
/ is the candidates affinity (normalized maximizing cost value), and p 
is the user parameter Mutationrate- 

7.2.7 Code Listing 

Listing 7.1 provides an example of the Clonal Selection Algorithm (CLON- 
ALG) implemented in the Ruby Programming Language. The demonstration 
problem is an instance of a continuous function optimization that seeks 
mmf{x) where / = ^"^^a^f, —5.0 < Xi < 5.0 and n = 3. The optimal 
solution for this basin function is (I'o, • • • , ^'n-i) = O-O- The algorithm 
is implemented as described by de Castro and Von Zuben for function 
optimization [8]. 
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def objective_f unction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x**2.0)> 
end 

def decode (bitstring, search_space, bits_per_param) 
vector = [] 

search_space . each_with_index do I bounds , i I 

off, sum = i*bits_per_param, 0.0 

param = bitstring[off ... (of f+bits_per_param)] .reverse 
param. size. times do |j| 

sum += ((param[j] .chr=='l') ? 1.0 : 0.0) * (2.0 ** j.to_f) 
end 

min, max = bounds 

vector « min + ((max-min)/((2.0**bits_per_param.to_f )-1.0)) * sum 
end 

return vector 
end 

def evaluate (pop, search.space , bits_per_param) 
pop. each do |p| 

p[: vector] = decode (p[: bitstring] , search.space, bits_per_param) 
p[:cost] = objective_function(p[: vector] ) 
end 
end 

def random_bitstring(num_bits) 

return (0. . .num.bits) .inject(""){|s,i| s«((rand<0.5) ? "1" : "0")> 
end 

def point_mutation(bitstring, rate) 
child = "" 
bitstring. size. times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<rate) ? ((bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def calculate_mutation_rate (antibody, mutate_f actor=-2 . 5) 

return Math. exp(mut at e_f actor * antibody[:aff inity] ) 
end 

def num_clones(pop_size, clone_f actor) 

return (pop_size * clone_f actor) .floor 
end 

def CEQ.culate_aff inity (pop) 

pop. sort !{ I x,y I x[: cost] <=>y [ : cost] } 
range = pop . last [: cost] - pop. first [: cost] 
if range == 0.0 

pop. each {Ipl p[: affinity] = 1.0} 
else 

pop. each {Ipl p[: affinity] = 1. 0-(p[: cost] /range)} 
end 
end 
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def clone_and_hypermutate(pop, clone_f actor) 

clones = [] 

num_clones = num_clones (pop. size, clone_f actor) 
calculate_Eif f inity (pop) 
pop. each do lantibodyl 

in_rate = calculate_mutation_rate (antibody) 

num_clones. times do 
clone = {} 

clone [: bitstring] = point_mutation(antibody [:bitstring] , m_rate) 
clones << clone 
end 
end 

return clones 
end 

def random_insertion(search_space, pop, num.raind, bits_per_param) 
return pop if num_rand == 0 
rands = Array. new (mim_r and) do |i| 

{ :bitstring=>random_bitstring(seEu:ch_space. size*bits_per_p2u:Eua)} 
end 

evaluate (rands , search_space , bits_per_param) 

return (pop+rands) .sort{|x,y| x[: cost] <=>y[: cost] }.first (pop. size) 
end 

def seeirch(search_space , max_gens, pop_size, clone_f actor , num_rand, 
bits_per_p8iram=16) 
pop = Array . new(pop_size) do |i| 

{ : bitstr ing=>random_bitstring (search_space . size*bits_per_pareuii) > 
end 

evaluate (pop, search_space, bits_per_parani) 
best = pop . min{ I X , y I x [: cost] <=>y [: cost] } 
max.gens . times do |gen| 

clones = clone_and_hypermutate(pop, clone_f actor) 

evaluate (clones , search_space, bits_per_param) 

pop = (pop+clones) . sort-[ I x ,y I x [: cost] <=>y [: cost] }-. f irst (pop_size) 
pop = random_insertion(search_space, pop, num_rand, bits_per_param) 
best = (pop + [best] ) .min{ |x,y| x [: cost] <=>y[: cost]} 
puts " > gen #{gen+i)-, f =#{best [: cost]}, s=#-[best [: vector] . inspect}" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array . new (problem_size) {lil [-5, +5]} 

# algorithm configuration 
max_gens = 100 
pop_size = 100 
clone_factor =0.1 
num_rand = 2 

# execute the algorithm 

best = search(search_space , max_gens, pop_size, clone_f actor , num_rand) 
puts "done! Solution: f =#-Cbest [ : cost] } , s=#{best [: vector] . inspect}" 
end 
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Listing 7.1: CLONALG in Ruby 



7.2.8 References 

Primary Sources 

Hidden at the back of a technical report on the apphcations of Artificial 
Immune Systems de Castro and Von Zuben [6] proposed the Clonal Selection 
Algorithm (CSA) as a computational realization of the clonal selection 
principle for pattern matching and optimization. The algorithm was later 
published [7], and investigated where it was renamed to CLONALG (CLONal 
selection ALGorithm) [8]. 

Learn More 

Watkins et al. proposed to exploit the inherent distributedness of the 
CLONALG and proposed a parallel version of the pattern recognition 
version of the algorithm [10]. White and Garret also investigated the 
pattern recognition version of CLONALG and generalized the approach for 
the task of binary pattern classification renaming it to Clonal Classification 
(CLONCLAS) where their approach was compared to a number of simple 
Hamming distance based heuristics [11]. In an attempt to address concerns 
of algorithm efficiency, parameterization, and representation selection for 
continuous function optimization Garrett proposed an updated version of 
CLONALG cahed Adaptive Clonal Selection (ACS) [9]. In their book, de 
Castro and Timmis provide a detailed treatment of CLONALG including 
a description of the approach (starting page 79) and a step through of 
the algorithm (starting page 99) [5]. Cutello and Nicosia provide a study 
of the clonal selection principle and algorithms inspired by the theory [4] . 
Brownlee provides a review of Clonal Selection algorithms providing a 
taxonomy, algorithm reviews, and a broader bibliography [1]. 
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7.3 Negative Selection Algorithm 

Negative Selection Algorithm, NSA. 

7.3.1 Taxonomy 

The Negative Selection Algorithm belongs to the field of Artificial Immune 
Systems. The algorithm is related to other Artificial humune Systems such 
as the Clonal Selection Algorithm (Section 7.2), and the Immune Network 
Algorithm (Section 7.5). 

7.3.2 Inspiration 

The Negative Selection algorithm is inspired by the self-nonself discrimina- 
tion behavior observed in the mammalian acquired immune system. The 
clonal selection theory of acquired immunity accounts for the adaptive behav- 
ior of the immune system including the ongoing selection and proliferation 
of cells that select-for potentially harmful (and typically foreign) material 
in the body. An interesting aspect of this process is that it is responsible 
for managing a population of immune cells that do not select-for the tissues 
of the body, specifically it does not create self-reactive immune cells known 
as auto-immunity. This problem is known as 'self-nonself discrimination' 
and it involves the preparation and on going maintenance of a repertoire 
of immune cells such that none are auto-immune. This is achieved by a 
negative selection process that selects-for and removes those cells that are 
self-reactive during cell creation and cell proliferation. This process has 
been observed in the preparation of T-lymphocytes, naive versions of which 
are matured using both a positive and negative selection process in the 
thymus. 

7.3.3 Metaphor 

The self-nonself discrimination principle suggests that the anticipatory 
guesses made in clonal selection are filtered by regions of infeasibility (pro- 
tein conformations that bind to self-tissues). Further, the self-nonself 
immunological paradigm proposes the modeling of the unknown domain 
(encountered pathogen) by modeling the complement of what is known. This 
is unintuitive as the natural inclination is to categorize unknown information 
by what is different from that which is known, rather than guessing at the 
unknown information and filtering those guesses by what is known. 

7.3.4 Strategy 

The information processing principles of the self-nonself discrimination 
process via negative selection are that of a anomaly and change detection 



278 



Chapter 7. Immune Algorithms 



systems that model the anticipation of variation from what is known. The 
principle is achieved by building a model of changes, anomalies, or unknown 
(non-normal or non-self) data by generating patterns that do not match 
an existing corpus of available (self or normal) patterns. The prepared 
non-normal model is then used to either monitor the existing normal data 
or streams of new data by seeking matches to the non-normal patterns. 



7.3.5 Procedure 

Algorithm 7.3.1 provides a pseudocode listing of the detector generation 
procedure for the Negative Selection Algorithm. Algorithm 7.3.2 provides a 
pseudocode listing of the detector application procedure for the Negative 
Selection Algorithm. 



Algorithm 7.3.1: Pseudocode for detector generation. 
Input: SelfData 
Output: Repertoire 

1 Repertoire ^ 0; 

2 while -iStopConditionO do 

3 
4 
5 
6 
7 
8 



Detectors ^ GenerateRandomDetectors () ; 
foreach Detectovi G Repertoire do 

if -iMat che s (Z^etecton, SelfData) then 

I Repertoire ^ D elector i] 
end 



end 
9 end 
10 return Repertoire; 



Algorithm 7.3.2: Pseudocode for detector application. 

Input: InputSamples, Repertoire 
1 for Inputi G InputSamples do 



Inputiciass ^ "non 



■self" 



foreach Detector i G Repertoire do 

if Matches (/npw^^, Detector i) then 
Inputiciass ^ "self"; 
Break; 
end 
end 



9 end 



CiJ!. 
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7.3.6 Heuristics 

• The Negative Selection Algorithm was designed for change detection, 
novelty detection, intrusion detection and similar pattern recognition 
and two-class classification problem domains. 

• Traditional negative selection algorithms used binary representations 
and binary matching rules such as Hamming distance, and r-contiguous 
bits. 

• A data representation should be selected that is most suitable for 
a given problem domain, and a matching rule is in turn selected or 
tailored to the data representation. 

• Detectors can be prepared with no prior knowledge of the problem 
domain other than the known (normal or self) dataset. 

• The algorithm can be configured to balance between detector conver- 
gence (quality of the matches) and the space complexity (number of 
detectors) . 

• The lack of dependence between detectors means that detector prepa- 
ration and application is inherently parallel and suited for a distributed 
and parallel implementation, respectively. 

7.3.7 Code Listing 

Listing 7.2 provides an example of the Negative Selection Algorithm imple- 
mented in the Ruby Programming Language. The demonstration problem 
is a two-class classification problem where samples are drawn from a two- 
dimensional domain, where Xi G [0,1]. Those samples in 1.0 > Xi > 0.5 
are classified as self and the rest of the space belongs to the non-self class. 
Samples are drawn from the self class and presented to the algorithm for 
the preparation of pattern detectors for classifying unobserved samples from 
the non-self class. The algorithm creates a set of detectors that do not 
match the self data, and are then applied to a set of randomly generated 
samples from the domain. The algorithm uses a real-valued representation. 
The Euclidean distance function is used during matching and a minimum 
distance value is specified as a user parameter for approximate matches 
between patterns. The algorithm includes the additional computationally 
expensive check for duplicates in the preparation of the self dataset and the 
detector set. 

def random_vector (minmax) 

return Array. new (minmax. length) do |i| 

minmax[i][0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 
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def euclidean_distance(cl , c2) 

sum = 0.0 

cl. each. index {|i| sum += (cl [i] -c2 [i] ) **2 . 0} 
return Hath. sqrt (sum) 
end 

def contains? (vector , space) 
vector. each_with_index do |v,i| 

return false if v<space [i] [0] or v>space[i] [1] 
end 

return true 
end 

def matches? (vector, dataset, min_dist) 
dataset . each do I pattern I 

diet = euclidean_distance (vector, pattern[: vector] ) 

return true if dist <= min.dist 
end 

return false 
end 

def generate_detectors(max_detectors, search_space, self _dataset , min_dist) 
detectors = [] 
begin 

detector = -[ : vector=>random_vector (search_space) } 

if Imatches? (detector [: vector] , self _dataset , min_dist) 

detectors « detector if Imatches? (detector [: vector] , detectors, 0.0) 
end 

end while detectors. size < max_detectors 
return detectors 
end 

def generate_self _dataset (num_records , self_space, search_space) 
self_dataset = [] 
begin 

pattern = {}■ 

pattern [: vector] = random_vector (search_space) 

next if matches? (pattern [: vector] , self .dataset , 0.0) 

if contains? (pattern [: vector] , self .space) 

self.dataset « pattern 
end 

end while self.dataset. length < num.records 
return self.dataset 
end 

def apply .detectors (detectors, bounds, self.dataset, min.dist, trials=50) 

correct = 0 
trials. times do |i| 

input = {:vector=>random_vector (bounds)} 

actual = matches? (input [: vector] , detectors, min.dist) ? "N" : "S" 
expected = matches? (input [: vector] , self.dataset, min.dist) ? "S" : "N" 
correct += 1 if actual==expected 

puts "#-[i+l}/#{trials}: predicted=#-C actual}, expected=#-Cexpected}" 
end 

puts "Done. Result: #{correct}/#{trials}" 
return correct 
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end 

def execute (bounds , self_space, max_detect, max_self, inin_dist) 

self_dataset = generate_self _dataset (max_self , self_space, bounds) 
puts "Done: prepared #-[self _dataset . size} self patterns." 

detectors = generate_detectors (max_detect , bounds, self _dataset , inin_dist) 
puts "Done: prepared #-[detectors . size} detectors." 
apply_detectors (detectors , bounds, self _dataset , min_dist) 
return detectors 
end 

if __FILE__ == $0 

# problem configuration 
pr obi em_ size = 2 

search_space = Array . new(problem_size) {[0.0, 1.0]}- 
self_space = Array . new(problem_size) -[[0.5, 1.0]}- 
inax_self = 150 

# algorithm configuration 
iiiax_detectors = 300 
iiiin_dist = 0.05 

# execute the algorithm 

execute (search_space , self_space, max_detectors , max_self, min_dist) 
end 

Listing 7.2: Negative Selection Algorithm in Ruby 

7.3.8 References 

Primary Sources 

The seminal negative selection algorithm was proposed by Forrest, et al. [5] 
in which a population of detectors are prepared in the presence of known 
information, where those randomly generated detectors that match against 
known data are discarded. The population of pattern guesses in the unknown 
space then monitors the corpus of known information for changes. The 
algorithm was applied to the monitoring of files for changes (corruptions and 
infections by computer viruses), and later formalized as a change detection 
algorithm [2, 3]. 

Learn More 

The Negative Selection algorithm has been applied to the monitoring of 
changes in the execution behavior of Unix processes [4, 8], and to monitor 
changes in remote connections of a network computer (intrusion detection) 
[6, 7]. The application of the algorithm has been predominantly to virus 
host intrusion detection and their abstracted problems of classification 
(two-class) and anomaly detection. Esponda provides some interesting work 
showing some compression and privacy benefits provided by maintaining 
a negative model (non-self) [1] Ji and Dasgupta provide a contemporary 
and detailed review of Negative Selection Algorithms covering topics such 
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as data representations, matching rules, detector generation procedures, 
computational complexity, hybridization, and theoretical frameworks [9]. 
Recently, the validity of the application of negative selection algorithms in 
high-dimensional spaces has been questioned, specifically given the scalability 
of the approach in the face of the exponential increase in volume within the 
problem space [10]. 
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7.4 Artificial Immune Recognition System 

Artificial Immune Recognition System, AIRS. 

7.4.1 Taxonomy 

The Artificial Immune Recognition System belongs to the field of Artificial 
Immune Systems, and more broadly to the field of Computational Intelli- 
gence. It was extended early to the canonical version called the Artificial 
Immune Recognition System 2 (AIRS2) and provides the basis for extensions 
such as the Parallel Artificial Immune Recognition System [8]. It is related 
to other Artificial Immune System algorithms such as the Dendritic Cell 
Algorithm (Section 7.6), the Clonal Selection Algorithm (Section 7.2), and 
the Negative Selection Algorithm (Section 7.3). 

7.4.2 Inspiration 

The Artificial Immune Recognition System is inspired by the Clonal Selection 
theory of acquired immunity. The clonal selection theory credited to Burnet 
was proposed to account for the behavior and capabilities of antibodies 
in the acquired immune system [1, 2]. Inspired itself by the principles of 
Darwinian natural selection theory of evolution, the theory proposes that 
antigens select- for lymphocytes (both B and T-cells). When a lymphocyte is 
selected and binds to an antigenic determinant, the cell proliferates making 
many thousands more copies of itself and differentiates into different cell 
types (plasma and memory cells). Plasma cells have a short lifespan and 
produce vast quantities of antibody molecules, whereas memory cells live 
for an extended period in the host anticipating future recognition of the 
same determinant. The important feature of the theory is that when a cell 
is selected and proliferates, it is subjected to small copying errors (changes 
to the genome called somatic hypermutation) that change the shape of the 
expressed receptors. It also affects the subsequent determinant recognition 
capabilities of both the antibodies bound to the lymphocytes cells surface, 
and the antibodies that plasma cells produce. 

7.4.3 Metaphor 

The theory suggests that starting with an initial repertoire of general immune 
cells, the system is able to change itself (the compositions and densities of 
cells and their receptors) in response to experience with the environment. 
Through a blind process of selection and accumulated variation on the large 
scale of many billions of cells, the acquired immune system is capable of 
acquiring the necessary information to protect the host organism from the 
specific pathogenic dangers of the environment. It also suggests that the 
system must anticipate (guess) at the pathogen to which it will be exposed, 
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and requires exposure to pathogen that may harm the host before it can 
acquire the necessary information to provide a defense. 

7.4.4 Strategy 

The information processing objective of the technique is to prepare a set of 
real- valued vectors to classify patterns. The Artificial Immune Recognition 
System maintains a pool of memory cells that are prepared by exposing 
the system to a single iteration of the training data. Candidate memory 
cells are prepared when the memory cells are insufficiently stimulated for 
a given input pattern. A process of cloning and mutation of cells occurs 
for the most stimulated memory cell. The clones compete with each other 
for entry into the memory pool based on stimulation and on the amount of 
resources each cell is using. This concept of resources comes from prior work 
on Artificial Immune Networks, where a single cell (an Artificial Recognition 
Ball or ARB) represents a set of similar cells. Here, a cell's resources are 
a function of its stimulation to a given input pattern and the number of 
clones it may create. 

7.4.5 Procedure 

Algorithm 8.6.1 provides a high-level pseudocode for preparing memory cell 
vectors using the Artificial Immune Recognition System, specifically the 
canonical AIRS2. An affinity (distance) measure between input patterns 
must be defined. For real- valued vectors, this is commonly the Euclidean 
distance: 



where n is the number of attributes, x is the input vector and c is a given 
cell vector. The variation of cells during cloning (somatic hypermutation) 
occurs inversely proportional to the stimulation of a given cell to an input 
pattern. 

7.4.6 Heuristics 

• The AIRS was designed as a supervised algorithm for classification 
problem domains. 

• The AIRS is non-parametric, meaning that it does not rely on assump- 
tions about that structure of the function that is is approximating. 

• Real- values in input vectors should be normalized such that x G [0, 1). 



n 




(7.1) 



i=l 
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Algorithm 7.4.1: Pseudocode for AIRS2. 



Input: InputPatterns, clonerate, mutatcrate, stimthreshi 
resourcesmax , a f finity thresh 

Output: CellSjjigqmory 

1 Cellsmemory ^ InitializeMemoryPool (InputPatterns) ; 

2 foreach InputPatterrii G InputPatterns do 



Stimulate (Cells. 



memory } 



InputPatterns); 



); 



Cellbest ^ GetMostStimulatedC/npwtPatterni, Cellsmemory/ ■, 
if Cellll'll^ ^ InputPaUernf'''' then 



Cells 



else 



memory 



CreateNewMemoryCell (InputPatterni); 



CloneSnum ^ Celll^^^ x clonerate X mutaterate] 

for i to CloneSnum do 

I Cellsciones ^ CloneAndMutate (Ce/Z^esf ) ; 
end 

while AverageStimulation (Ce/ZscZones) < stinithresh do 
foreach Celli G Cellsdones 

do 

I Cellsciones CloneAndMutate (CeZZi) ; 
end 

Stimulate (Cellsdones, InputPatterns) ; 
ReducePoolToMaximumResour ces iCellsdones , 

resourceSmax ) ; 
end 

Cellc ^ GetMostStimulated (/npiitPatiernj, Cellsdones^'i 
if Cellf"^ > CellfJ^ then 

C ellSYfiQffiory CgIIq^ 

if Affinity (CeZ/c, Cellbest) < af finity thresh then 
j DeleteCell (CeZZtest > Cells^Q^nory) ] 
end 



end 
end 

28 end 

29 return Cells. 



memory j 



• Euclidean distance is commonly used to measure the distance between 
real- valued vectors (affinity calculation), although other distance mear- 
sures may be used (such as dot product), and data specific distance 
measures may be required for non-scalar attributes. 

• Cells may be initialized with small random values or more commonly 
with values from instances in the training set. 

• A cell's affinity is typically minimizing, where as a cells stimulation is 
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maximizing and typically G [0, 1]. 

7.4.7 Code Listing 

Listing 7.3 provides an example of the Artificial Immune Recognition System 
implemented in the Ruby Programming Language. The problem is a 
contrived classification problem in a 2-dimensional domain x E [0,l],y E 
[0,1] with two classes: 'A' {x e [0, 0.4999999], y G [0,0.4999999]) and 'B' 
(xe [0.5, 1],^G [0.5,1]). 

The algorithm is an implementation of the AIRS2 algorithm [7]. An 
initial pool of memory cells is created, one cell for each class. Euclidean 
distance divided by the maximum possible distance in the domain is taken as 
the affinity and stimulation is taken as 1.0 — affinity. The meta-dynamics 
for memory cells (competition for input patterns) is not performed and may 
be added into the implementation as an extension. 

def random_vector (minmax) 

return Array. newCminmax. size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def generate_random_pattern (domain) 

class_label = domain. keys [rand(domain. keys . size) ] 
pattern = -( : label=>class_label]- 

pattern [: vector] = random_vector (domain [class_label] ) 
return pattern 
end 

def create_cell (vector , class_label) 

return -[ : label=>class_label , : vector=>vector} 
end 

def initialize_cells (domain) 
mem_cells = [] 
domain . keys . each do I key | 

mem_cells << create_cell (random_vector ( [ [0 , 1] , [0 , 1] ] ) , key) 
end 

return mem_cells 
end 

def distance (cl, c2) 
sum = 0.0 

cl . each_index {|i| sum += (cl [i] -c2 [i] ) **2 . 0> 
return Math. sqrt (sum) 
end 

def stimulate (cells , pattern) 

max_dist = distance ( [0 . 0 , 0. 0] , [1.0,1.0]) 
cells. each do | cell | 

cell [ : af f inity] = distance (cell [: vector] , pattern [: vector] ) / max_dist 

cell [: stimulation] = 1.0 - cell [: affinity] 
end 
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end 

def get_most_stimulated_cell (meiii_cells, pattern) 
stimulate (meiii_cells, pattern) 

return mem_cells.sort-[|x,y| y[: stimulation] <=> x[: stimulation]}. first 
end 

def mutate_cell(cell, best_match) 

range = 1.0 - best_inatch [: stimulation] 
cell [: vector] . each_with_index do |v,i| 

min = [(v- (range/2 . 0) ) , 0.0]. max 

max = [(v+(range/2.0)) , 1.0]. min 

cell [: vector] [i] = min + (rand() * (max-min)) 
end 

return cell 
end 

def create_arb_pool (pattern, best_match, clone_rate, mutate_rate) 
pool = [] 

pool « create_cell(best_match[: vector] , best_match [: label] ) 

num_clones = (best_match[: stimulation] * clone.rate * mutate.rate) .round 

num_clones . times do 

cell = create_cell(best_match[: vector] , best_match[ : label] ) 

pool « mutate_cell(cell, best .match.) 
end 

return pool 
end 

def competition_f or_resournces (pool , clone_rate, max_res) 

pool. each {I cell | cell [: resources] = cell [: stimulation] * clone_rate} 
pool. sort !{ I x,y I x [: resources] <=> y [: resources]} 

totaI_resources = pool . inj ect (0 . 0){ | sum, cell | sum + cell [: resources]} 
while total_resources > max_res 
cell = pool. delete_at (pool. size-1) 
total_resources -= cell [: resources] 
end 
end 

def ref ine_arb_pool (pool , pattern, stim.thresh, clone_rate, max.res) 
mean_stim, candidate = 0.0, nil 
begin 

stimulate (pool , pattern) 

candidate = pool . sort{ | x ,y | y [: stimulation] <=> x [: stimulation] }. first 
mean_stim = pool . inject (0 . 0) -[ | s ,c | s + c [: stimulation] } / pool, size 
if meaii_stim < stim_thresh 

candidate = competition_f or_resournces(pool, clone_rate, max.res) 
pool . size . times do |i| 

cell = create_cell (pool [i] [: vector] , pool [i] [: label] ) 
mutate_cell(cell, pool[i]) 
pool « cell 
end 
end 

end until mean_stim >= stim.thresh 
return candidate 
end 
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def add_candldate_to_memory_pool (candidate, best_match, mem.cells) 

if candidate [: stimulation] > best_inatch[: stimulation] 
mem_cells << candidate 

end 
end 

def classif y_pattern(mem_cells, pattern) 
stimulate (mem.cells, pattern) 

return mem_cells.sort{|x,y| y [:stimulation] <=> x[: stimulation]}. first 
end 

def train_system(mem_cells, domain, num.patterns, clone.rate, mutate_rate, 

stim_thresh, max_res) 
num_patterns. times do |i| 

pattern = generate_random_pattern(domain) 

best_match = get_most_stimulated_cell (mem_cells, pattern) 

if best_match [: label] != pattern [: label] 

mem_cells << create_cell(pattern[: vector] , pattern [: label] ) 
elsif best_match[: stimulation] < 1.0 

pool = create_arb_pool (pattern, best_match, clone_rate, mutate_rate) 

cand = refine_arb_pool (pool , pattern, stim_thresh, clone_rate, max_res) 
add_candidate_to_memory_pool (cand, best_match, mem_cells) 
end 

puts " > iter=#{i+l}, mem_cells=#{mem_cells.size}" 
end 
end 

def test_system(mem_cells, domain, num_triELLs=50) 
correct = 0 
num_trials. times do 

pattern = generate_random_pattern(domain) 

best = classif y_pattern(mem_cells , pattern) 
correct += 1 if best [: label] == pattern [: label] 
end 

puts "Finished test with a score of #-[correct>/#{num_trials}" 
return correct 
end 

def execute (domain, num.patterns, clone_rate, mutate.rate, stim_thresh, 

max_res) 

mem_cells = initialize.cells (domain) 

tr ain_ system (mem.cells, domain, num.patterns , clone_rate, mutate.rate, 

stim_thresh, max_res) 
test_system(mem_cells , domain) 
return mem.cells 
end 

if __FILE__ == $0 

# problem configuration 

domain = {"A"=> [ [0 , 0 .4999999] , [0,0.4999999]] , "B"=> [ [0.5, 1] , [0.5,1]]} 

num_patterns = 50 

# algorithm configuration 
clone_rate = 10 
mutate_rate = 2.0 
stim_thresh = 0.9 
max_res = 150 
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# execute the algorithm 

execute (domain, num_patterns , clone_rate, mutate_rate, stim_thresh, 
max_res) 

end 

Listing 7.3: AIRS in Ruby 



7.4.8 References 

Primary Sources 

The Artificial Immune Recognition System was proposed in the Masters 
work by Watkins [10], and later published [11]. Early works included the 
application of the AIRS by Watkins and Boggess to a suite of benchmark 
classification problems [6], and a similar study by Goodman and Boggess 
comparing to a conceptually similar approach called Learning Vector Quan- 
tization [3]. 

Learn More 

Mar wall and Boggess investigated the algorithm seeking issues that affect 
the algorithms performance [5]. They compared various variations of the 
algorithm with modified resource allocation schemes, tie-handling within 
the ARB pool, and ARB pool organization. Watkins and Timmis proposed 
a new version of the algorithm called AIRS2 which became the replacement 
for AIRSl [7]. The updates reduced the complexity of the approach while 
maintaining the accuracy of the results. An investigation by Goodman et al. 
into the so called ^source of power'' in AIRS indicated that perhaps the 
memory cell maintenance procedures played an important role [4]. Watkins 
et al. provide a detailed review of the technique and its application [9]. 
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7.5 Immune Network Algorithm 

Artificial Immune Network, aiNet, Optimization Artificial Immune Network, 
opt-aiNet. 

7.5.1 Taxonomy 

The Artificial Immune Network algorithm (aiNet) is a Immune Network 
Algorithm from the field of Artificial Immune Systems. It is related to 
other Artificial Immune System algorithms such as the Clonal Selection 
Algorithm (Section 7.2), the Negative Selection Algorithm (Section 7.3), and 
the Dendritic Cell Algorithm (Section 7.6). The Artificial Immune Network 
algorithm includes the base version and the extension for optimization 
problems called the Optimization Artificial Immune Network algorithm 
(opt-aiNet). 

7.5.2 Inspiration 

The Artificial Immune Network algorithm is inspired by the Immune Network 
theory of the acquired immune system. The clonal selection theory of 
acquired immunity accounts for the adaptive behavior of the immune system 
including the ongoing selection and proliferation of cells that select-for 
potentially harmful (and typically foreign) material in the body. A concern 
of the clonal selection theory is that it presumes that the repertoire of 
reactive cells remains idle when there are no pathogen to which to respond. 
Jerne proposed an Immune Network Theory (Idiotypic Networks) where 
immune cells are not at rest in the absence of pathogen, instead antibody 
and immune cells recognize and respond to each other [6-8]. 

The Immune Network theory proposes that antibody (both free floating" 
and surface bound) possess idiotopes (surface features) to which the receptors 
of other antibody can bind. As a result of receptor interactions, the repertoire 
becomes dynamic, where receptors continually both inhibit and excite each 
other in complex regulatory networks (chains of receptors). The theory 
suggests that the clonal selection process may be triggered by the idiotopes 
of other immune cells and molecules in addition to the surface characteristics 
of pathogen, and that the maturation process applies both to the receptors 
themselves and the idiotopes which they expose. 

7.5.3 Metaphor 

The immune network theory has interesting resource maintenance and 
signaling information processing properties. The classical clonal selection 
and negative selection paradigms integrate the accumulative and filtered 
learning of the acquired immune system, whereas the immune network 
theory proposes an additional order of complexity between the cells and 
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molecules under selection. In addition to cells that interact directly with 
pathogen, there are cells that interact with those reactive cells and with 
pathogen indirectly, in successive layers such that networks of activity for 
higher-order structures such as internal images of pathogen (promotion), 
and regulatory networks (so-called anti-idiotopes and anti-anti-idiotopes). 

7.5.4 Strategy 

The objective of the immune network process is to prepare a repertoire 
of discrete pattern detectors for a given problem domain, where better 
performing cells suppress low-afRnity (similar) cells in the network. This 
principle is achieved through an interactive process of exposing the pop- 
ulation to external information to which it responds with both a clonal 
selection response and internal meta-dynamics of intra-population responses 
that stabilizes the responses of the population to the external stimuli. 

7.5.5 Procedure 

Algorithm 7.5.1 provides a pseudocode listing of the Optimization Artificial 
Immune Network algorithm (opt-aiNet) for minimizing a cost function. 

7.5.6 Heuristics 

• aiNet is designed for unsupervised clustering, where as the opt-aiNet 
extension was designed for pattern recognition and optimization, specif- 
ically multi-modal function optimization. 

• The amount of mutation of clones is proportionate to the affinity of 
the parent cell with the cost function (better fitness, lower mutation). 

• The addition of random cells each iteration adds a random-restart like 
capability to the algorithms. 

• Suppression based on cell similarity provides a mechanism for reducing 
redundancy. 

• The population size is dynamic, and if it continues to grow it may be 
an indication of a problem with many local optima or that the affinity 
threshold may needs to be increased. 

• Affinity proportionate mutation is performed using c' — c+a x A'^(l, 0) 
where a = ^ x exp(—f), N is a, Guassian random number, and / is 
the fitness of the parent cell, /3 controls the decay of the function and 
can be set to 100. 

• The affinity threshold is problem and representation specific, for 
example a AffinityThreshold may be set to an arbitrary value such 
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Algorithm 7.5.1: Pseudocode for opt-aiNet. 



Input: Population size 1 ProblemSize, N, 



clones'! -^random: 

AffinityThreshold 

Output: Sbest 

Population ^ InitializePopulationCPopw/atzon, 
while -iStopConditionO do 

EvaluatePopulation(Population) ; 
Sbest ^ GetBestSolution(Population) ; 
Progeny ^ 0; 



''Size ? 



ProblemSize) ; 



Cost, 



avg 



CalculateAveragePopulationCost (Population) ; 



while CalculateAveragePopulationCost (Population) > 



Cost avg do 

foreach Celli G Population do 

Clones CreateClones (Ce//^, Ndones^] 
foreach Clonei G Clones do 
Clonei ^ 

MutateRelativeToFitnessOf Parent {Clonei, Celli); 
end 

EvaluatePopulation(Clones) ; 
Progeny GetBestSolution(Clones) ; 
end 
end 

SupressLowAffinityCells (Progeny, AffinityThreshold) ; 

Progeny ^ Cr eat eRandomCell s (TV^-andom) ; 
Population •(— Progeny; 

20 end 

21 return SiesU 



as 0.1 on a continuous function domain, or calculated as a percentage 
of the size of the problem space. 

• The number of random cells inserted may be 40% of the population 
size. 

• The number of clones created for a cell may be small, such as 10. 



7.5.7 Code Listing 

Listing 7.4 provides an example of the Optimization Artificial Immune 
Network (opt-aiNet) implemented in the Ruby Programming Language. The 
demonstration problem is an instance of a continuous function optimization 
that seeks mmf{x) where / = Yl7=i-^i^ —5.0 < < 5.0 and n — 2. 
The optimal solution for this basin function is {vq, . . . , = 0.0. The 
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algorithm is an implementation based on the specification by de Castro and 
Von Zuben [1]. 

def objective_function(vector) 

return vector . inject (0 . 0) {I sum, x| sum + (x**2.0)> 
end 

def random_vector (minmax) 

return Array .new (minmax. size) do |i| 

minmax[i][0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def random_gaussian(mean=0 . 0 , stdev=1.0) 
ul = u2 = w = 0 
begin 

ul = 2 * randO - 1 

u2 = 2 * randO - 1 

w = ul * ul + u2 * u2 
end while w >= 1 

w = Math. sqrt( (-2.0 * Math.log(w)) / w) 
return mean + (u2 * w) * stdev 
end 

def clone (parent) 

V = Array . new(parent [: vector] . size) -[ I i I parent [: vector] [i] }- 

return {:vector=>v} 
end 

def mutation_rate (beta, normalized_cost) 

return (1.0/beta) * Math. exp (-normalized_cost) 
end 

def mutate (beta, child, normalized_cost) 
child [: vector] . each_with_index do |v, i| 

alpha = mutation_rate (beta, normalized_cost) 
child [: vector] [i] = v + alpha * random_gaussian() 
end 
end 

def clone_cell (beta, num_clones, parent) 

clones = Array. new(num_clones) {clone (parent) } 

clones, each -[ | clone | mutate (beta, clone, parent [: norm_cost] ) ]■ 

clones . each-C I c I c[:cost] = objective_function(c [: vector] ) } 

clones . sort ! -[ I X , y I x[:cost] <=> y[:cost]}- 

return clones. first 

end 

def calculate_normalized_cost (pop) 
pop .sort!-C|x,y| x[:cost] <=>y [ : cost] }• 
range = pop . last [: cost] - pop . f irst [: cost] 
if range == 0.0 

pop. each {|p| p [ : norm_cost] = 1.0} 
else 

pop. each -[ | p I p [ : norm_cost] = 1 . 0- (p [: cost] /range) } 
end 
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end 

def average_cost (pop) 

sum = pop. inject (0. 0){ I sum, x| sum + x[:cost]3- 

return sum / pop. size. to_f 
end 

def distance (cl, c2) 

sum = 0.0 

cl.each_index { | i | sum += (cl [i] -c2 [i] )**2. 0} 
r e turn Mat h . s qr t ( sum) 
end 

def get_neighborhood(cell, pop, aff_thresh) 
neighbors = [] 
pop . each do I p I 

neighbors « p if distance (p[: vector] , cell [: vector] ) < aff_thresh 
end 

return neighbors 
end 

def affinity_supress (population, aff_thresh) 
pop = [] 

population. each do | cell | 

neighbors = get_neighborhood(cell , population, aff_thresh) 
neighbors . sort !{ I x,y I x[:cost] <=> y[:cost]} 

pop « cell if neighbors. empty? or cell. equal? (neighbors. first) 
end 

return pop 
end 

def search(search_space, meix.gens, pop.size, num_clones, beta, num.rand, 

af f _thresh) 

pop = Array. new (pop_size) {|i| {: vector=>random_vector (search_space)} } 
pop. each-[ I c| c[:cost] = obj ective_f unction (c [: vector] ))• 

best = nil 

max_g8ns . times do |gen| 

pop. each-C I c I c[:cost] = objective_function(c[:vector])} 

calculate_normalized_cost (pop) 

pop. sort ! { I x,y I x[:cost] <=> y[:cost]]- 

best = pop. first if best. nil? or pop.f irst [:cost] < best[:cost] 

avgCost, progeny = average_cost (pop) , nil 

begin 

progeny=Array.new(pop. size)-C I i I clone_cell (beta, num_clones, pop[i])}- 
end until average_cost (progeny) < avgCost 
pop = affinity_supr ess (progeny, aff_thresh) 

num_r and. times {pop << { : vector=>random_vector (search_space) }•}- 
puts " > gen #-[gen+l}, popSize=#-[pop. size}, f itness=#{best [ : cost] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array .nev(problem_size) {|i| [-5, +5]} 
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# algorithm configuration 
max_gens = 150 
pop_size = 20 
nuni_clones = 10 

beta = 100 
nuin_rand = 2 

aff_thresh = (search_space [0] [1] -search_space [0] [0] ) *0 . 05 

# execute the algorithm 

best = search(search_space, max_gens, pop_size, nuiii_clones , beta, 

nuin_rand, aff_thresh) 
puts "done! Solution: f =#-Cbest [ : cost] }• , s=#{best [: vector] . inspect} " 
end 

Listing 7.4: Optimization Artificial Immune Network in Ruby 



7.5.8 References 

Primary Sources 

Early works, such as Farmer et al. [5] suggested at the exploitation of the 
information processing properties of network theory for machine learning. A 
seminal network theory based algorithm was proposed by Timmis et al. for 
clustering problems called the Artificial humune Network (AIN) [11] that was 
later extended and renamed the Resource Limited Artificial Immune System 
[12] and Artificial Immune Network (AINE) [9]. The Artificial Immune 
Network (aiNet) algorithm was proposed by de Castro and Von Zuben that 
extended the principles of the Artificial Immune Network (AIN) and the 
Clonal Selection Algorithm (CLONALG) and was applied to clustering [2]. 
The aiNet algorithm was further extended to optimization domains and 
renamed opt-aiNet [1]. 

Learn More 

The authors de Castro and Von Zuben provide a detailed presentation of 
the aiNet algorithm as a book chapter that includes immunological theory, 
a description of the algorithm, and demonstration application to clustering 
problem instances [3]. Timmis and Edmonds provide a careful examination 
of the opt-aiNet algorithm and propose some modifications and augmenta- 
tions to improve its applicability and performance for multimodal function 
optimization problem domains [10]. The authors de Franca, Von Zuben, 
and de Castro proposed an extension to opt-aiNet that provided a num- 
ber of enhancements and adapted its capability for for dynamic function 
optimization problems called dopt-aiNet [4]. 
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7.6 Dendritic Cell Algorithm 

Dendritic Cell Algorithm, DC A. 



7.6.1 Taxonomy 

The Dendritic Cell Algorithm belongs to the field of Artificial Immune 
Systems, and more broadly to the field of Computational Intelligence. The 
Dendritic Cell Algorithm is the basis for extensions such as the Deterministic 
Dendritic Cell Algorithm (dDCA) [2]. It is generally related to other 
Artificial Immune System algorithms such as the Clonal Selection Algorithm 
(Section 7.2), and the Immune Network Algorithm (Section 7.5). 



7.6.2 Inspiration 

The Dendritic Cell Algorithm is inspired by the Danger Theory of the mam- 
malian immune system, and specifically the role and function of dendritic 
cells. The Danger Theory was proposed by Matzinger and suggests that 
the roles of the acquired immune system is to respond to signals of danger, 
rather than discriminating self from non-self [7, 8]. The theory suggests 
that antigen presenting cells (such as helper T-cells) activate an alarm 
signal providing the necessarily co-stimulation of antigen-specific cells to 
respond. Dendritic cells are a type of cell from the innate immune system 
that respond to some specific forms of danger signals. There are three main 
types of dendritic cells: 'immature' that collect parts of the antigen and the 
signals, 'semi-mature' that are immature cells that internally decide that 
the local signals represent safe and present the antigen to T-cells resulting 
in tolerance, and 'mature' cells that internally decide that the local signals 
represent danger and present the antigen to T-cells resulting in a reactive 
response. 



7.6.3 Strategy 

The information processing objective of the algorithm is to prepare a set of 
mature dendritic cells (prototypes) that provide context specific information 
about how to classify normal and anomalous input patterns. This is achieved 
as a system of three asynchronous processes of 1) migrating sufficiently 
stimulated immature cells, 2) promoting migrated cells to semi-mature (safe) 
or mature (danger) status depending on their accumulated response, and 3) 
labeling observed patterns as safe or dangerous based on the composition of 
the sub-population of cells that respond to each pattern. 
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7.6.4 Procedure 

Algorithm 7.6.1 provides pseudocode for training a pool of cells in the 
Dendritic Cell Algorithm, specifically the Deterministic Dendritic Cell 
Algorithm. Mature migrated cells associate their collected input patterns 
with anomalies, whereas semi-mature migrated cells associate their collected 
input patterns as normal. The resulting migrated cells can then be used to 
classify input patterns as normal or anomalous. This can be done through 
sampling the cells and using a voting mechanism, or more elaborate methods 
such as a 'mature context antigen value' (MCAV) that uses ^ (where M 
is the number of mature cells with the antigen and Ag is the sum of the 
exposures to the antigen by those mature cells), which gives a probability 
of a pattern being an anomaly. 



Algorithm 7.6.1: Pseudocode for the Dendritic Cell Algorithm. 
Input: InputPatterns, iter ations max cellsnurm 

MigrationThreshhounds 
Output: MigratedCells 

1 ImmatureCells ^ InitializeCells (ce/Zs^^^^, 

MigrationThreshhounds ) ; 

2 MigratedCells ^ 0; 

3 for i = 1 to iterationSmax do 
Pi ^ SelectlnputPattern(lnputPatterns) ; 

^ {P^danger 2 X i^ig^jg), 
CmSi i {Pidanger P^safe)) 

foreach Celli G ImmatureCells do 

UpdateCellOutputSignals (Ce//i, ki, cmsi); 
StorekntigeniCelli, Piantigen)] 
if Celliiifespan < 0 then 

I RelnitializeCell (Ce^/^) ; 
else if Cellicsm ^ Cellithresh then 
RemoveCell (ImmatureCells, Celli); 
ImmatureCells ^ 

CreateNewCell (MigrationThreshhounds ) ; 
if Cellik < 0 then 

I Cellitype Mature; 
else 

I Cellitype ^ Semimature; 
end 

MigratedCells ^ Celli] 
end 
end 

23 end 

24 return MigratedCells; 
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7.6.5 Heuristics 

• The Dendritic Cell Algorithm is not specifically a classification algo- 
rithm, it may be considered a data filtering method for use in anomaly 
detection problems. 

• The canonical algorithm is designed to operate on a single discrete, 
categorical or ordinal input and two probabilistic specific signals 
indicating the heuristic danger or safety of the input. 

• The danger and safe signals are problem specific signals of the risk 
that the input pattern is an anomaly or is normal, both typically 
G [0,100]. 

• The danger and safe signals do not have to be reciprocal, meaning 
they may provide conflicting information. 

• The system was designed to be used in real-time anomaly detection 
problems, not just static problem. 

• Each cells migration threshold is set separately, typically G [5, 15] 

7.6.6 Code Listing 

Listing 7.5 provides an example of the Dendritic Cell Algorithm implemented 
in the Ruby Programming Language, specifically the Deterministic Dendritic 
Cell Algorithm (dDCA). The problem is a contrived anomaly-detection 
problem with ordinal inputs x G [0, 50) , where values that divide by 
10 with no remainder are considered anomalies. Probabilistic safe and 
danger signal functions are provided, suggesting danger signals correctly 
with P{danger) = 0.70, and safe signals correctly with P{safe) = 0.95. 

The algorithm is an implementation of the Deterministic Dendritic Cell 
Algorithm (dDCA) as described in [2, 9], with verification from [5]. The 
algorithm was designed to be executed as three asynchronous processes in 
a real-time or semi-real time environment. For demonstration purposes, 
the implementation separated out the three main processes and executed 
the sequentially as a training and cell promotion phase followed by a test 
(labeling phase). 

def rand_in_bounds (min, max) 

return min + ( (max-min) * randO) 
end 

def random_vector (search_space) 

return Array . new(search_space . size) do |i| 

rand_in_bouiids (search_space [i] [0] , search_space [i] [1]) 

end 
end 

def construct_pattern(class_label , domain, p_safe, p_danger) 
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set = domain [class_label] 
selection = rand (set . size) 

pattern = {}■ 

pattern [: class_label] = class_label 
pattern [: input] = set [selection] 
pattern [: safe] = (randO * p_safe * 100) 
pattern [: danger] = (rand() * p_danger * 100) 
return pattern 
end 

def generate_pattern(domain, p_anoiiialy, p_noriiial, prob_create_anoiii=0 . 5) 
pattern = nil 

if randO < prob_create_anom 

pattern = construct_pattern( "Anomaly" , domain, 1 . 0-p_normal , p_anomaly) 

puts ">Generated Anomaly [#{pattern [: input] >] " 
else 

pattern = construct _pattern("Normal" , domain, p_normal, 1 . 0-p_anomaly) 
end 

return pattern 
end 

def initialize_cell (thresh, cell={}) 
cell [: lifespan] = 1000.0 

cell[:k] =0.0 
cell [: cms] =0.0 

cell [:migration_threshold] = rand_in_bounds (thresh [0] , thresh[l]) 
cell [: antigen] = O 
return cell 
end 

def store_antigen(cell, input) 

if cell [: antigen] [input] .nil? 

cell [: antigen] [input] = 1 
else 

cell [: antigen] [input] += 1 
end 
end 

def expose_cell (cell , cms, k, pattern, threshold) 

cell [ : cms] += cms 

cell[:k] += k 

cell [: lifespan] -= cms 

store_antigen(cell , pattern [ : input] ) 

initialize_cell (threshold, cell) if cell [: lifespan] <= 0 
end 

def can_cell_migrate? (cell) 

return (cell [: cms] >=cell[:migration_threshold] and ! cell [: eintigen] . empty?) 
end 

def expose_all_cells (cells, pattern, threshold) 
migrate = [] 

cms = (pattern [: safe] + pattern [: danger] ) 

k = pattern [: danger] - (pattern [: safe] * 2.0) 
cells. each do | cell | 

expose_cell(cell, cms, k, pattern, threshold) 
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if caii_cell_migrate?(cell) 

migrate << cell 

cell[:class_label] = (cell[:k]>0) ? "Anomaly" : "Normal" 
end 
end 

return migrate 
end 

def train_system(domain, max_iter, num_cells, p_anomaly, p_normal, thresh) 
immature_cells = Array .new(iiiim_cells){ initialize_cell (thresh) }• 
migrated = [] 
max_ iter .times do I iter I 

pattern = generate_pattern(domain, p_anomaly, p_normal) 
migrants - expose_all_cells(immature_cells, pattern, thresh) 
migrants . each do I cell I 

immature_cells . delete (cell) 
immature_cells << initiEG.ize_cell (thresh) 
migrated « cell 
end 

puts "> iter=#-Citer} new=#{migrants.size}, migrated=#-[migrated. size}" 
end 

return migrated 
end 

def classify_pattern (migrated, pattern) 
input = pattern [: input] 
num_cells, num_ antigen =0, 0 
migrated. each do | cell | 

if cell [: class_label] == "Anomaly" and ! cell [:cintigen] [input] .nil? 
num_cells += 1 

num_antigen += cell [: antigen] [input] 
end 
end 

mcav = num_cells.to_f / num_ antigen. to_f 
return (mcav>0.5) ? "Anomaly" : "Normal" 
end 

def test_system(migrated, domain, p.anomaly, p_normal, nuffl_trial=100) 
correct_norm = 0 
num_trial . times do 

pattern = construct _pattern(" Normal " , domain, p_normal, 1 . 0-p_anomaly) 

class_label = classify.pattern (migrated, pattern) 

correct_norm += 1 if class_label == "Normal" 
end 

puts "Finished testing Normal inputs #{correct_norm}/#{num_trial}" 
correct_anom = 0 
num_trial . times do 

pattern = construct_pattern ( "Anomaly " , domain, 1 . 0-p_normal , p_anomaly) 

class_label = classif y_pattern(migrated, pattern) 

correct_anom += 1 if class_label == "Anomaly" 
end 

puts "Finished testing Anomaly inputs #{correct_anom}/#-[num_trial}-" 
return [correct .norm, correct.anom] 
end 

def execute (domain, maix_iter, num_cells, p_anom, p_norm, thresh) 



Copyrighted material 



304 



Chapter 7. Immune Algorithms 



inigrated=train_system(domain, max_iter, num_cells, p_anoiii, p_norm, thresh) 
test_systein(migrated, domain, p_anoiii, p_noriii) 
return migrated 
end 

if __FILE__ == $0 

# problem configuration 
domain = -[}• 

domain ["Normal"] = Array . new(50) { | i | i} 

domain ["Anomaly"] = Array . new(5) { | i | (i+l)*10> 

domain ["Normal"] = domain["Normal"] - domain ["Anomaly"] 

p_anomaly = 0.70 

p_normal = 0.95 

# algorithm configuration 
iterations = 100 
num_cells = 10 

thresh = [5,15] 

# execute the algorithm 

execute (domain, iterations, num_cells, p_anomaly, p_normal, thresh) 
end 

Listing 7.5: Deterministic Dendritic Cell Algorithm in Ruby 



7.6.7 References 

Primary Sources 

The Dendritic Cell Algorithm was proposed by Greensmith, Aickelin and 
Cayzer describing the inspiring biological system and providing experimental 
results on a classification problem [4]. This work was followed shortly by 
a second study into the algorithm by Greensmith, Twy cross, and Aick- 
elin, focusing on computer security instances of anomaly detection and 
classification problems [6]. 



Learn More 

The Dendritic Cell Algorithm was the focus of Greensmith's thesis, which 
provides a detailed discussion of the methods abstraction from the inspiring 
biological system, and a review of the technique's limitations [1]. A formal 
presentation of the algorithm is provided by Greensmith et al. [5]. Green- 
smith and Aickelin proposed the Deterministic Dendritic Cell Algorithm 
(dDCA) that seeks to remove some of the stochastic decisions from the 
method, and reduce the complexity and to make it more amenable to analysis 
[2]. Stibor et al. provide a theoretical analysis of the Deterministic Dendritic 
Cell Algorithm, considering the discrimination boundaries of single dendrite 
cells in the system [9]. Greensmith and Aickelin provide a detailed overview 
of the Dendritic Cell Algorithm focusing on the information processing 
principles of the inspiring biological systems as a book chapter [3]. 
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Chapter 8 

Neural Algorithms 



8.1 Overview 

This chapter describes Neural Algorithms. 

8.1.1 Biological Neural Networks 

A Biological Neural Network refers to the information processing elements of 
the nervous system, organized as a collection of neural cells, called neurons, 
that are interconnected in networks and interact with each other using 
electrochemical signals. A biological neuron is generally comprised of an 
axon which provides the input signals and is connected to other neurons via 
synapses. The neuron reacts to input signals and may produce an output 
signal on its output connection called the dendrites. 

The study of biological neural networks falls within the domain of 
neuroscience which is a branch of biology concerned with the nervous 
system. Neuroanatomy is a subject that is concerned with the the structure 
and function of groups of neural networks both with regard to parts of the 
brain and the structures that lead from and to the brain from the rest of the 
body. Neuropsychology is another discipline concerned with the structure 
and function of the brain as they relate to abstract psychological behaviors. 
For further information, refer to a good textbook on any of these general 
topics. 

8.1.2 Artificial Neural Networks 

The field of Artificial Neural Networks (ANN) is concerned with the in- 
vestigation of computational models inspired by theories and observation 
of the structure and function of biological networks of neinral cells in the 
brain. They are generally designed as models for addressing mathemat- 
ical, computational, and engineering problems. As such, there is a lot 
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of interdisciplinary research in mathematics, neurobiology and computer 
science. 

An Artificial Neural Network is generally comprised of a collection 
of artificial neurons that are interconnected in order to performs some 
computation on input patterns and create output patterns. They are 
adaptive systems capable of modifying their internal structure, typically 
the weights between nodes in the network, allowing them to be used for a 
variety of function approximation problems such as classification, regression, 
feature extraction and content addressable memory. 

Given that the focus of the field is on performing computation with 
networks of discrete computing units, the field is traditionally called a 
'connectionist' paradigm of Artificial Intelligence and 'Neural Computation'. 

There are many types of neural networks, many of which fall into one of 
two categories: 

• Feed-forward Networks where input is provided on one side of the 

network and the signals are propagated forward (in one direction) 
through the network structure to the other side where output signals 
are read. These networks may be comprised of one cell, one layer 
or multiple layers of neurons. Some examples include the Percep- 
tron. Radial Basis Function Networks, and the multi-layer perceptron 
networks. 

• Recurrent Networks where cycles in the network are permitted 

and the structure may be fully interconnected. Examples include the 
Hopfield Network and Bidirectional Associative Memory. 

Artificial Neural Network structures are made up of nodes and weights 
which typically require training based on samples of patterns from a problem 
domain. Some examples of learning strategies include: 

• Supervised Learning where the network is exposed to the input 
that has a known expected answer. The internal state of the network 
is modified to better match the expected result. Examples of this 
learning method include the Back-propagation algorithm and the Hebb 
rule. 

• Unsupervised Learning where the network is exposed to input 
patterns from which it must discern meaning and extract features. 
The most common type of unsupervised learning is competitive learn- 
ing where neurons compete based on the input pattern to produce 
an output pattern. Examples include Neural Gas, Learning Vector 
Quantization, and the Self-Organizing Map. 

Artificial Neural Networks are typically difficult to configure and slow 
to train, but once prepared are very fast in application. They are generally 
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used for function approximation-based problem domains and prized for their 
capabilities of generalization and tolerance to noise. They are known to 
have the limitation of being opaque, meaning there is little explanation to 
the subject matter expert as to why decisions were made, only how. 

There are many excellent reference texts for the field of Artificial Neural 
Networks, some selected texts include: ^^Neural Networks for Pattern Recog- 
nition" by Bishop [1], Neural Smithing: Supervised Learning in Feedforward 
Artificial Neural Networks" by Reed and Marks II [8] and "y4n Introduction 
to Neural Networks" by Gurney [2]. 

8.1.3 Extensions 

There are many other algorithms and classes of algorithm that were not 
described from the field of Artificial Neural Networks, not limited to: 

• Radial Basis Function Network: A network where activation 
functions are controlled by Radial Basis Functions [4]. 

• Neural Gas: Another self-organizing and unsupervised competitive 
learning algorithm. Unlike SOM (and more like LVQ), the nodes 
are not organized into a lower-dimensional structure, instead the 
competitive Hebbian-learning like rule is applied to connect, order, 
and adapt nodes in feature space [5-7]. 

• Hierarchical Temporal Memory: A neural network system based 
on models of some of the structural and algorithmic properties of the 
neocortex [3]. 
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8.2 Perceptron 



Perceptron. 

8.2.1 Taxonomy 

The Perceptron algorithm belongs to the field of Artificial Neural Networks 
and more broadly Computational hitelligence. It is a single layer feedforward 
neural network (single cell network) that inspired many extensions and 
variants, not limited to ADALINE and the Widrow-Hoff learning rules. 

8.2.2 Inspiration 

The Perceptron is inspired by the information processing of a single neural 
cell (called a neuron). A neuron accepts input signals via its axon, which 
pass the electrical signal down to the cell body. The dendrites carry the 
signal out to synapses, which are the connections of a cell's dendrites to 
other cell's axons. In a synapse, the electrical activity is converted into 
molecular activity (neurotransmitter molecules crossing the synaptic cleft 
and binding with receptors). The molecular binding develops an electrical 
signal which is passed onto the connected cells axon. 



The information processing objective of the technique is to model a given 
function by modifying internal weightings of input signals to produce an 
expected output signal. The system is trained using a supervised learning 
method, where the error between the system's output and a known expected 
output is presented to the system and used to modify its internal state. 
State is maintained in a set of weightings on the input signals. The weights 
are used to represent an abstraction of the mapping of input vectors to 
the output signal for the examples that the system was exposed to during 
training. 

8.2.4 Procedure 

The Perceptron is comprised of a data structure (weights) and separate 
procedures for training and applying the structure. The structure is really 
just a vector of weights (one for each expected input) and a bias term. 

Algorithm 8.6.1 provides a pseudocode for training the Perceptron. A 
weight is initialized for each input plus an additional weight for a fixed 
bias constant input that is almost always set to 1.0. The activation of the 
network to a given input pattern is calculated as follows: 



8.2.3 Strategy 



n 




(8.1) 



k=i 
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where n is the number of weights and inputs, Xki is the fc*^ attribute on 
the i*^ input pattern, and Wbias is the bias weight. The weights are updated 
as follows: 

Wi{t + 1) = Wi{t) + a X (e(t) - a{t)) x Xi{t) (8.2) 

where wi is the i*^ weight at time t and t + 1, ot is the learning rate, 
e{t) and a{t) are the expected and actual output at time £, and Xi is the i*'* 
input. This update process is applied to each weight in turn (as well as the 
bias weight with its contact input). 



Algorithm 8.2.1: Pseudocode for the Perceptron. 

Input: ProblemSize, InputPatterns, iterationsmax^ learrirate 
Output: Weights 

1 Weights InitializeWeights (ProblemSize); 

2 for i = 1 to iterationSmax do 

3 Patterrti SelectlnputPattern(lnputPatterns) ; 

4 Activatiorii ^ ActivateNetwork(Patternj, Weights); 

5 Outputi ^ Transf erActivation(Ac^^^;at^or^i) ; 

6 Updat e Weights (Pattern^, Outputi, lecirnj.ate)\ 

7 end 

8 return Weights; 



8.2.5 Heuristics 

• The Perceptron can be used to approximate arbitrary linear functions 
and can be used for regression or classification problems. 

• The Perceptron cannot learn a non-linear mapping between the input 
and output attributes. The XOR problem is a classical example of a 
problem that the Perceptron cannot learn. 

• Input and output values should be normalized such that x ^ [0? !)• 

• The learning rate (a G [0, 1]) controls the amount of change each error 
has on the system, lower learning rages are common such as 0.1. 

• The weights can be updated in an online manner (after the exposure 
to each input pattern) or in batch (after a fixed number of patterns 
have been observed). 

• Batch updates are expected to be more stable than online updates for 
some complex problems. 

• A bias weight is used with a constant input signal to provide stability 
to the learning process. 
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• A step transfer function is commonly used to transfer the activation 
to a binary output value 1 ■(— activation > 0, otherwise 0. 

• It is good practice to expose the system to input patterns in a different 
random order each enumeration through the input set. 

• The initial weights are typically small random values, typically G 
[0,0.5]. 

8.2.6 Code Listing 

Listing 8.1 provides an example of the Perceptron algorithm implemented in 
the Ruby Programming Language. The problem is the classical OR boolean 
problem, where the inputs of the boolean truth table are provided as the two 
inputs and the result of the boolean OR operation is expected as output. 

The algorithm was implemented using an online learning method, mean- 
ing the weights are updated after each input pattern is observed. A step 
transfer function is used to convert the activation into a binary output 
G {0, 1}. Random samples are taken from the domain to train the weights, 
and similarly, random samples are drawn from the domain to demonstrate 
what the network has learned. A bias weight is used for stability with a 
constant input of 1.0. 

def raiido]ii_vector (minmax) 

return Array. new(minmax. size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def initialize_weights (problem_size) 

minmax = Array .new(problem_size + 1) {[-1.0,1.0]} 

return random_vect or (minmax) 
end 

def update_weights (num_inputs , weights, input, out_exp, out_act, l_rate) 
num_inputs . times do |i| 

weights [i] += l_rate * (out_exp - out_act) * input [i] 
end 

weights [n\im_ inputs] += l_rate * (out_exp - out_act) * 1.0 
end 

def activate (weights , vector) 

sum = weights [weights . size-1] * 1.0 
vector . each_with_index do | input , i| 

sum += weights [i] * input 
end 

return sum 
end 

def transfer (activation) 

return (activation >= 0) ? 1.0 : 0.0 
end 
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def get_output (weights , vector) 

activation = activate (weights, vector) 

return transfer (activation) 
end 

def train_weights (weights, domain, num_inputs , iterations. Irate) 
iterations. times do I epoch I 

error = 0.0 

domain. each do | pattern | 

input = Array .new (num_inputs) {|k| pattern [k] .to_f} 
output = get.output (weights , input) 
expected = pattern. last . to_f 
error += (output - expected) . abs 

update_weights (num_inputs , weights, input, expected, output. Irate) 
end 

puts "> epoch=#{epoch} , error=#{error}" 
end 
end 

def test_weights(weights, domain, num_inputs) 
correct = 0 

domain . e ach do I pat t em I 

input_vector = Array. new(num_ inputs) -C|k| pattern [k] .to_f}- 
output = get_output (weights , input_vector) 
correct += 1 if output. round == pattern. last 
end 

puts "Finished test with a score of #{correct]-/#{domain. size}" 
return correct 
end 

def execute (domain, num_inputs, iterations, learning.rate) 
weights = initialize_weights (num_inputs) 

train.weights (weights, domain, num_inputs, iterations, learning_rate) 
test_weights (weights, domain, num_inputs) 
return weights 
end 

if __FILE__ == $0 

# problem configuration 

or_problem = [[0,0,0], [0,1,1], [1,0,1], [1,1,1]] 
inputs = 2 

# algorithm configuration 
iterations = 20 
learning_rate = 0.1 

# execute the algorithm 

execute (or_problem, inputs, iterations, learning_rate) 
end 



Listing 8.1: Perceptron in Ruby 
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8.2.7 References 

Primary Sources 

The Perceptron algorithm was proposed by Rosenblatt in 1958 [3]. Rosen- 
blatt proposed a range of neural network structures and methods. The 
'Perceptron' as it is known is in fact a simplification of Rosenblatt's models 
by Minsky and Papert for the purposes of analysis [1]. An early proof of 
convergence was provided by Novikoff [2]. 

Learn More 

Minsky and Papert wrote the classical text titled "Perceptrons" in 1969 
that is known to have discredited the approach, suggesting it was limited 
to linear discrimination, which reduced research in the area for decades 
afterward [1]. 
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8.3 Back-propagation 

Back-propagation, Backpropagation, Error Back Propagation, Backprop, 
Delta-rule. 

8.3.1 Taxonomy 

The Back-propagation algorithm is a supervised learning method for multi- 
layer feed- forward networks from the field of Artificial Neural Networks 
and more broadly Computational Intelligence. The name refers to the 
backward propagation of error during the training of the network. Back- 
propagation is the basis for many variations and extensions for training 
multi-layer feed- forward networks not limited to Vogl's Method (Bold Drive), 
Deltst-Bar-Delta, Quickprop, and Rprop. 

8.3.2 Inspiration 

Feed-forward neural networks are inspired by the information processing of 
one or more neural cells (called a neuron). A neuron accepts input signals 
via its axon, which pass the electrical signal down to the cell body. The 
dendrites carry the signal out to synapses, which are the connections of a 
cell's dendrites to other cell's axons. In a synapse, the electrical activity is 
converted into molecular activity (neurotransmitter molecules crossing the 
synaptic cleft and binding with receptors). The molecular binding develops 
an electrical signal which is passed onto the connected cells axon. The 
Back-propagation algorithm is a training regime for multi-layer feed forward 
neural networks and is not directly inspired by the learning processes of the 
biological system. 

8.3.3 Strategy 

The information processing objective of the technique is to model a given 
function by modifying internal weightings of input signals to produce an 
expected output signal. The system is trained using a supervised learning 
method, where the error between the system's output and a known expected 
output is presented to the system and used to modify its internal state. State 
is maintained in a set of weightings on the input signals. The weights are used 
to represent an abstraction of the mapping of input vectors to the output 
signal for the examples that the system was exposed to during training. Each 
layer of the network provides an abstraction of the information processing 
of the previous layer, allowing the combination of sub-functions and higher 
order modeling. 



Copyrighted material 



8.3. Back-propagation 



317 



8.3.4 Procedure 

The Back-propagation algorithm is a method for training the weights in 
a multi-layer feed- forward neural network. As such, it requires a network 
structure to be defined of one or more layers where one layer is fully 
connected to the next layer. A standard network structure is one input 
layer, one hidden layer, and one output layer. The method is primarily 
concerned with adapting the weights to the calculated error in the presence 
of input patterns, and the method is applied backward from the network 
output layer through to the input layer. 

Algorithm 8.6.1 provides a high-level pseudocode for preparing a network 
using the Back-propagation training method. A weight is initialized for 
each input plus an additional weight for a fixed bias constant input that is 
almost always set to 1.0. The activation of a single neuron to a given input 
pattern is calculated as follows: 



activation = > Wk x Xki ) + wuas x 1-0 (8-3) 




where n is the number of weights and inputs, x^i is the k*^ attribute 
on the input pattern, and wnas is the bias weight. A logistic transfer 
function (sigmoid) is used to calculate the output for a neuron G [0, 1] and 
provide nonlinearities between in the input and output signals: i_^ea;p(-a) ' 
where a represents the neuron activation. 

The weight updates use the delta rule, specifically a modified delta rule 
where error is backwardly propagated through the network, starting at the 
output layer and weighted back through the previous layers. The following 
describes the back-propagation of error and weight updates for a single 
pattern. 

An error signal is calculated for each node and propagated back through 
the network. For the output nodes this is the sum of the error between the 
node outputs and the expected outputs: 

esi = {ci - Oi) X tdi (8.4) 

where eSi is the error signal for the i^^ node, is the expected output 
and Oi is the actual output for the i*^ node. The td term is the derivative 
of the output of the i*^ node. If the sigmod transfer function is used, tdi 
would be Oi X (1 — Oi) For the hidden nodes, the error signal is the sum of 
the weighted error signals from the next layer. 



^{wik X esk)j X tdi (8.5) 
k=i ' 



where esi is the error signal for the i^^ node, Wi}^ is the weight between 
the i^^ and the k^^ nodes, and es^ is the error signal of the kth node. 



CiJ!. 



318 



Chapter 8. Neural Algorithms 



The error derivatives for each weight are calculated by combining the 
input to each node and the error signal for the node. 

n 

edi = ^ esi X Xk 
k=l 

where edi is the error derivative for the i^^ node, esi is the error signal 
for the i^^ node and Xk is the input from the k^^ node in the previous layer. 
This process include the bias input that has a constant value. 

Weights are updated in a direction that reduces the error derivative edi 
(error assigned to the weight), metered by a learning coefficient. 

Wi{t + 1) = Wi{t) + {edk X learurate) (8.7) 

where Wi{t + 1) is the updated i*^ weight, edk is the error derivative for 
the k*^ node and learrirate is an update coefficient parameter. 

Algorithm 8.3.1: Pseudocode for Back-propagation. 

Input: ProblemSize, InputPatterns, iterationSmax-, l^arnrate 
Output: Network 

1 Network ^ ConstructNetworkLayers () ; 

2 Networkji,pigkts ^ InitializeWeights (Network, ProblemSize); 

3 for i = 1 to iterationSmax do 

4 Patterrii SelectlnputPattern(lnputPatterns) ; 

5 Outputi -f- ForwardPropagate (Pattern^, Network); 

6 BackwardPropagateError (Paiierrij, Outputi, Network); 

7 UpdateWeights (PaWerni, Outputi, Network, learrirate)] 

8 end 

9 return Network; 



(8.6) 



8.3.5 Heuristics 

• The Back-propagation algorithm can be used to train a multi-layer 
network to approximate arbitrary non-linear functions and can be 
used for regression or classification problems. 

• Input and output values should be normalized such that a; G [0, 1). 

• The weights can be updated in an online manner (after the exposure 
to each input pattern) or in batch (after a fixed number of patterns 
have been observed). 

• Batch updates are expected to be more stable than online updates for 
some complex problems. 
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• A logistic (sigmoid) transfer function is commonly used to transfer the 
activation to a binary output value, although other transfer functions 
can be used such as the hyperbolic tangent (tanh), Gaussian, and 
softmax. 

• It is good practice to expose the system to input patterns in a different 
random order each enumeration through the input set. 

• The initial weights are typically small random values E [0, 0.5]. 

• Typically a small number of layers are used such as 2-4 given that the 
increase in layers result in an increase in the complexity of the system 
and the time required to train the weights. 

• The learning rate can be varied during training, and it is common to 
introduce a momentum term to limit the rate of change. 

• The weights of a given network can be initialized with a global op- 
timization method before being refined using the Back-propagation 
algorithm. 

• One output node is common for regression problems, where as one 
output node per class is common for classification problems. 

8.3.6 Code Listing 

Listing 8.2 provides an example of the Back-propagation algorithm imple- 
mented in the Ruby Programming Language. The problem is the classical 
XOR boolean problem, where the inputs of the boolean truth table are 
provided as inputs and the result of the boolean XOR operation is expected 
as output. This is a classical problem for Back-Propagation because it was 
the problem instance referenced by Minsky and Papert in their analysis 
of the Perceptron highlighting the limitations of their simplified models of 
neural networks [3]. 

The algorithm was implemented using a batch learning method, meaning 
the weights are updated after each epoch of patterns are observed. A logistic 
(sigmoid) transfer function is used to convert the activation into an output 
signal. Weight updates occur at the end of each epoch using the accumulated 
delta's. A momentum term is used in conjunction with the past weight 
update to ensure the last update influences the current update, reducing 
large changes. 

A three layer network is demonstrated with 2 nodes in the input layer 
(two inputs), 2 nodes in the hidden layer and 1 node in the output layer, 
which is sufficient for the chosen problem. A bias weight is used on each 
neuron for stability with a constant input of 1.0. The learning process is 
separated into four steps: forward propagation, backward propagation of 
error, calculation of error derivatives (assigning blame to the weights) and 
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the weight update. This separation facihties easy extensions such as adding 
a momentum term and/or weight decay to the update process. 

def random_vector (minmax) 

return Array .newCminmax. size) do |i| 

minmsLxCi] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def initialize_weights(num_ weights) 

minmax = Array . new (num_weights) { [-randO ,rand()])- 
return raadom_vector (minmax) 
end 

def activate (weights , vector) 

sum = weights [weights . size-1] * 1.0 
vector. each_with_index do I input, i| 

sum += weights [i] * input 
end 

return sum 
end 

def transfer (activation) 

return 1.0 / (1.0 + Math. exp (-activation)) 
end 

def transfer_derivative (output) 

return output * (1.0 - output) 
end 

def f orward_propagate (net , vector) 
net . each_ with. index do (layer, i| 

input=(i==0)? vector : Array .new(net [i-1] . size){ | k|net [i-1] [k] [: output]} 
layer. each do | neuron | 

neuron[: activation] = activate (neuron [: weights] , input) 
neuron [ : output] - transfer (neuron [ : activation] ) 
end 
end 

return net . last [0] [ : output] 
end 

def backward_propagate_error (network, expected_output) 
network. size. times do |n| 

index = network. size - 1 - n 
if index == network. size-1 
neuron = network [index] [0] # assume one node in output layer 
error = (expected_output - neuron [: output] ) 

neuron [: delta] = error * transfer_derivative (neuron [: output]) 
else 

network [index] .each_with_index do I neuron, k| 
sum = 0.0 

# only sum errors weighted by connection to the current k'th neuron 
network [index+1] . each do | next_neuron | 

sum += (next_neuron[: weights] [k] * next_neuron [: delta] ) 
end 

neuron [: delta] = sum * transfer_derivative (neuron [: output]) 
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end 
end 
end 
end 

def calculate_error_derivatives_f or_weights(net , vector) 
net . each_with_index do | layer, i| 

input=(i==0)? vector : Array .new(net [i-1] . size){ I k|net [i-1] [k] [: output] } 
layer. each do I neuron I 

input . each_with_ index do | signal, j| 

neuron [:deriv] [j] += neuron [: delta] * signal 
end 

neuron [: deriv] [-1] += neuron [: delta] * 1.0 
end 
end 
end 

def update_weights (network, Irate, mom=0.8) 
network. each do | layer | 
layer. each do | neuron I 

neuron [: weights] . each_with_index do |w, j| 

delta = (Irate * neuron [: deriv] [j] ) + (neuron [: last_delta] [j] * mom) 
neuron [: weights] [j] += delta 
neuron [: last_delta] [j] = delta 
neuron [: deriv] [j] = 0.0 
end 
end 
end 
end 

def train_network (network, domain, num_inputs, iterations. Irate) 

correct = 0 

iterations . times do | epoch | 
domain. each do | pattern | 

vector , expected= Array. new(num_ inputs) i | k| pattern[k] .to_f>, pattern. last 
output = forward_propagate (network, vector) 
correct += 1 if output. round == expected 
backward_propagate_error (network, expected) 
calculate_error_derivatives_f or_weights (network, vector) 
end 

update_weights (network, Irate) 
if (epoch+1) .modulo (100) == 0 

puts "> epoch=#{ epoch+1}, Correct=#{correct}/#{100*domain. size}" 
correct = 0 
end 
end 
end 

def test .network (network, domain, num.inputs) 
correct = 0 

domain. each do | pattern! 

input_vector = Array .new(num_inputs) {|k| pattern [k] . to_f} 

output = forward.propaigate (network, input .vector) 

correct += 1 if output. round == pattern. last 
end 

puts "Finished test with a score of #{correct}/#{domain. length}" 
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return correct 
end 

def create_neuroii(num_inputs) 

return { : weights=>initialize_weights (num_inputs+l) , 
: last _de It a=> Array . new (num_inputs+l) -[0 . 0}- , 
: deriv=> Array . new(nuin_inputs+l) {0 . 0}} 

end 

def execute (domain, num_inputs , iterations, nuin_nodes, Irate) 
network = [] 

network << Array. new(nuin_nodes)-[create_neuron(num_inputs)}- 
network << Array .new(l)-[create_neuron(network. last . size)} 

puts "Topology: #-[num_inputs}- #-[network. inj ect ( " " ) { | m, i | m+" . sizej- "}}" 
train_network(network, domain, num_inputs, iterations, Irate) 
test_network(network, domain, nxim_inputs) 
return network 
end 

if __FILE__ == $0 

# problem configuration 

xor = [[0,0,0], [0,1,1], [1,0,1], [1,1,0]] 
inputs = 2 

# algorithm configuration 
learning_rate = 0.3 
num_hidden_nodes = 4 
iterations = 2000 

# execute the algorithm 

execute (xor, inputs, iterations, num_hidden_nodes , learning_rate) 
end 

Listing 8.2: Back-propagation in Ruby 



8.3.7 References 

Primary Sources 

The backward propagation of error method is credited to Bryson and Ho 
in [1]. It was apphed to the training of multi-layer networks and called 
back-propagation by Rumelhart, Hinton and Williams in 1986 [5, 6]. This 
effort and the collection of studies edited by Rumelhart and McClelland 
helped to define the field of Artificial Neural Networks in the late 1980s 
[7, 8]. 

Learn More 

A seminal book on the approach was "Backpropagation: theory, archi- 
tectures, and applications" by Chauvin and Rumelhart that provided an 
excellent introduction (chapter 1) but also a collection of studies applying 
and extending the approach [2]. Reed and Marks provide an excellent 
treatment of feed-forward neural networks called "Neural Smithing" that 
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includes chapters dedicated to Back-propagation, the configuration of its 
parameters, error surface and speed improvements [4]. 
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8.4 Hopfield Network 

Hopfield Network, HN, Hopfield Model. 

8.4.1 Taxonomy 

The Hopfield Network is a Neural Network and belongs to the field of Arti- 
ficial Neural Networks and Neural Computation. It is a Recurrent Neural 
Network and is related to other recurrent networks such as the Bidirec- 
tional Associative Memory (BAM). It is generally related to feedforward 
Artificial Neural Networks such as the Perceptron (Section 8.2) and the 
Back-propagation algorithm (Section 8.3). 

8.4.2 Inspiration 

The Hopfield Network algorithm is inspired by the associated memory 
properties of the human brain. 

8.4.3 Metaphor 

Through the training process, the weights in the network may be thought to 
minimize an energy function and slide down an energy surface. In a trained 
network, each pattern presented to the network provides an attractor, where 
progress is made towards the point of attraction by propagating information 
around the network. 

8.4.4 Strategy 

The information processing objective of the system is to associate the 
components of an input pattern with a holistic representation of the pattern 
called Content Addressable Memory (CAM). This means that once trained, 
the system will recall whole patterns, given a portion or a noisy version of 
the input pattern. 

8.4.5 Procedure 

The Hopfield Network is comprised of a graph data structure with weighted 
edges and separate procedures for training and applying the structure. The 
network structure is fully connected (a node connects to all other nodes 
except itself) and the edges (weights) between the nodes are bidirectional. 

The weights of the network can be learned via a one-shot method (one- 
iteration through the patterns) if all patterns to be memorized by the 
network are known. Alternatively, the weights can be updated incrementally 
using the Hebb rule where weights are increased or decreased based on 



8.4. HopGeld Network 



325 



the difference between the actual and the expected output. The one-shot 
calculation of the network weights for a single node occurs as follows: 

N 

where Wi_j is the weight between neuron i and j, N is the number of 
input patterns, v is the input pattern and vl. is the i*^ attribute on the k^^ 
input pattern. 

The propagation of the information through the network can be asyn- 
chronous where a random node is selected each iteration, or synclironously, 
where the output is calculated for each node before being applied to the 
whole network. Propagation of the information continues until no more 
changes are made or until a maximum number of iterations has completed, 
after which the output pattern from the network can be read. The activation 
for a single node is calculated as follows: 

n 

ni = E Wij X nj (8.9) 

j=i 

where is the activation of the i^^ neuron, Wij with the weight between 
the nodes i and j, and nj is the output of the j*^^ neuron. The activation is 
transferred into an output using a transfer function, typically a step function 
as follows: 



transfer{ni) — 




if>0 
if<0 



where the threshold 6 is typically fixed at 0. 

8.4.6 Heuristics 

• The Hopfield network may be used to solve the recall problem of 
matching cues for an input pattern to an associated pre-learned pat- 
tern. 

• The transfer function for turning the activation of a neuron into an 
output is typically a step function /(a) € {—1, 1} (preferred), or more 
traditionally /(a) G {0,1}. 

• The input vectors are typically normalized to boolean values x G 

[-1,1]- 

• The network can be propagated asynchronously (where a random 
node is selected and output generated), or synchronously (where the 
output for all nodes are calculated before being applied). 
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• Weights can be learned in a one-shot or incremental method based on 
how much information is known about the patterns to be learned. 

• All neurons in the network are typically both input and output neurons, 
although other network topologies have been investigated (such as the 
designation of input and output neurons) . 

• A Hopfield network has limits on the patterns it can store and retrieve 
accurately from memory, described byA^<0.15xn where A'^ is the 
number of patterns that can be stored and retrieved and n is the 
number of nodes in the network. 

8.4.7 Code Listing 

Listing 8.3 provides an example of the Hopfield Network algorithm imple- 
mented in the Ruby Programming Language. The problem is an instance 
of a recall problem where patters are described in terms of a 3 x 3 matrix 
of binary values (g { — 1,1}). Once the network has learned the patterns, 
the system is exposed to perturbed versions of the patterns (with errors 
introduced) and must respond with the correct pattern. Two patterns are 
used in this example, specifically 'T', and 'U'. 

The algorithm is an implementation of the Hopfield Network with a 
one- shot training method for the network weights, given that all patterns are 
already known. The information is propagated through the network using 
an asynchronous method, which is repeated for a fixed number of iterations. 
The patterns are displayed to the console during the testing of the network, 
with the outputs converted from { — 1, 1} to {0, 1} for readability. 

def random_vector (minmax) 

return Array . newCminmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def initialize_weights (problem_size) 

minmax = Array . new(problem_size) {[-0.5,0.5]} 

return random_vector (minmax) 
end 

def create_neuron(num_inputs) 
neuron = -[}■ 

neuron [: weights] = initialize_weights (num_inputs) 
return neuron 
end 

def transfer (activation) 

return (activation >= 0) ? 1 : -1 
end 

def propagate_was_chaiige? (neurons) 
i = rand (neurons . size) 
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activation = 0 

neurons . each_with_index do [other, j| 

activation += other [: weights] [i]*other[: output] if i!=j 
end 

output = transfer (activation) 
change = output != neurons [i] [: output] 
neurons [i] [: output] = output 
return change 
end 

def get _output (neurons , pattern, evals=100) 
vector = pattern. flatten 

neurons . each_with_index {| neuron, i| neuron [: output] = vector [i]} 
evals. times -[ propagate_was_change? (neurons) } 
return Array. new(neurons. size){| i| neurons [i] [: output] } 
end 

def train_network (neurons, patters) 
neurons. each_vith_index do | neuron, i| 
for j in ( (i+1) .. .neurons. size) do 
next if i==j 
wij =0.0 

patters. each do I pattern I 

vector = pattern. flatten 

wij += vector [i] *vector [j] 
end 

neurons [i] [: weights] [j] = wij 
neurons [j] [: weights] [i] = wij 
end 
end 
end 

def to_binary (vector) 

return Array. new(vector.size){|i| ( (vector [i]==-i) ? 0 : 1)> 
end 

def print_patterns (provided, expected, actual) 

p, e, a = to_binary (provided) , to_binEu:y (expected) , to_binary (actual) 
pi, p2, p3 = p[0. .2] . joinC , '), p[3. .5] . joinC , '), p [6 . . 8] . j oin( ' , ') 
el, e2, e3 = e [0 . . 2] . j oin( ' , '), e [3. .5] . join( ' , '), e [6 . . 8] . j oin( ' , ') 
al, a2, a3 = a[0. .2] . joinC , '), a[3 . . 5] . join( ' , '), a[6 . . 8] . join( ' , ') 
puts "Provided Expected Got" 
puts "#{pl} #{el} #{al}" 
puts "#{p2} #-Ce2} #{a2}" 
puts "#-Cp3} #{e3> #{a3}" 

end 

def calculate_error (expected, actual) 
sum = 0 

expected. each_with_index do |v, i| 

sum += 1 if expected [i] !=actual [i] 
end 

return sum 
end 

def perturb_pattern (vector , num_errors=l) 
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perturbed = Array . new(vector) 
indicies = [rand (perturbed. size)] 
while indicies . size < nuin_errors do 
index = rand (perturbed. size) 

indicies << index if ! indicies . include? (index) 
end 

indicies . each { I i I perturbed [i] = ( (perturbed [i] ==1) ? -1 : 1)} 
return perturbed 
end 

def test_network (neurons , patterns) 
error = 0.0 

patterns . each do | pattern | 

vector = pattern. flatten 

perturbed = perturb_pattern(vector) 

output = get_output (neurons , perturbed) 

error += calculate_error (vector , output) 

print_patterns (perturbed , vector, output) 
end 

error = error / patterns . size . to_f 
puts "Final Result: avg pattern error=#-[error}" 
return error 
end 

def execute (patters , nuiii_inputs) 

neurons = Array . new (num_inputs) {_ create_neuron (nuin_inputs) }■ 

train_network(neurons , patters) 

test_network(neurons , patters) 

return neurons 
end 

if __FILE__ == $0 

# problem configuration 
nuin_ inputs = 9 

pi = [[1,1,1] , [-1,1,-1] , [-1,1,-1]] # T 
p2 = [[1,-1,1] , [1,-1,1] , [1,1,1]] #U 
patters = [pi, p2] 

# execute the algorithm 
execute (patters , num_inputs) 

end 

Listing 8.3: Hopfield Network in Ruby 



8.4.8 References 

Primary Sources 

The Hopfield Network was proposed by Hopfield in 1982 where the basic 
model was described and related to an abstraction of the inspiring biological 
system [2]. This early work was extended by Hopfield to 'graded' neurons 
capable of outputting a continuous value through use of a logistic (sigmoid) 
transfer function [3]. An innovative work by Hopfield and Tank considered 
the use of the Hopfield network for solving combinatorial optimization 
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problems, with a specific study into the system apphed to instances of the 
Travehng Salesman Problem [4]. This was achieved with a large number of 
neurons and a representation that decoded the position of each city in the 
tour as a sub-problem on which a customized network energy function had 
to be minimized. 

Learn More 

Popovici and Boncut provide a summary of the Hopfield Network algorithm 
with worked examples [5]. Overviews of the Hopfield Network are provided 
in most good books on Artificial Neural Networks, such as [6]. Hertz, 
Krogh, and Palmer present an in depth study of the field of Artificial Neural 
Networks with a detailed treatment of the Hopfield network from a statistical 
mechanics perspective [1]. 
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8.5 Learning Vector Quantization 

Learning Vector Quantization, LVQ. 

8.5.1 Taxonomy 

The Learning Vector Quantization algorithm belongs to the field of Artificial 
Neural Networks and Neural Computation. More broadly to the field of 
Computational Intelligence. The Learning Vector Quantization algorithm 
is an supervised neural network that uses a competitive (winner-take- all) 
learning strategy. It is related to other supervised neural networks such as the 
Perceptron (Section 8.2) and the Back-propagation algorithm (Section 8.3). 
It is related to other competitive learning neural networks such as the the 
Self-Organizing Map algorithm (Section 8.6) that is a similar algorithm 
for unsupervised learning with the addition of connections between the 
neurons. Additionally, LVQ is a baseline technique that was defined with a 
few variants LVQl, LVQ2, LVQ2.1, LVQ3, OLVQl, and 0LVQ3 as well as 
many third-party extensions and refinements too numerous to list. 

8.5.2 Inspiration 

The Learning Vector Quantization algorithm is related to the Self- Organizing 
Map which is in turn inspired by the self-organizing capabilities of neurons 
in the visual cortex. 

8.5.3 Strategy 

The information processing objective of the algorithm is to prepare a set of 
codebook (or prototype) vectors in the domain of the observed input data 
samples and to use these vectors to classify unseen examples. An initially 
random pool of vectors is prepared which are then exposed to training 
samples. A winner-take- all strategy is employed where one or more of the 
most similar vectors to a given input pattern are selected and adjusted to be 
closer to the input vector, and in some cases, further away from the winner 
for runners up. The repetition of this process results in the distribution 
of codebook vectors in the input space which approximate the underlying 
distribution of samples from the test dataset. 

8.5.4 Procedure 

Vector Quantization is a technique from signal processing where density 
functions are approximated with prototype vectors for applications such as 
compression. Learning Vector Quantization is similar in principle, although 
the prototype vectors are learned through a supervised winner-take-all 
method. 
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Algorithm 8.6.1 provides a high-level pseudocode for preparing codebook 
vectors using the Learning Vector Quantization method. Codebook vectors 
are initialized to small floating point values, or sampled from an available 
dataset. The Best Matching Unit (BMU) is the codebook vector from the 
pool that has the minimum distance to an input vector. A distance measure 
between input patterns must be defined. For real-valued vectors, this is 
commonly the Euclidean distance: 

n 

dist{x, c) = "^{xi - Cif (8.10) 

i=\ 

where n is the number of attributes, x is the input vector and c is a 
given codebook vector. 



Algorithm 8.5.1: Pseudocode for LVQl. 



2 
3 
4 

5 
6 
7 

8 
9 



Input: ProblemSize, InputPatterns, iterations^ax-, 

CodehookV ectorsnurm learrirate 
Output: CodebookVectors 
CodebookVectors <r- 

InitializeCodebookVectors (C odebookV ectorsnum , ProblemSize) ; 
for i = 1 to iterationSmax do 

Patterrii ^ SelectlnputPattern(lnputPatterns) ; 
Bmui ^ SelectBestMatchingUnit (Pattern^, 
CodebookVectors) ; 
foreach Bmuf*^^^^^^^ G Bmui do 
if Bmuf'''' = Patternf'''' then 

{Pattern f*'^'^''^'' - Bmuf*'^^^''*'') 
else 



Q^^attribute ^ Q^^attribute _ leaVTlrate X 



{Patternf*'''^''^^ 

10 end 

11 end 

12 end 

13 return CodebookVectors; 



attribute 

vnu^ 



) 



8.5.5 Heuristics 

• Learning Vector Quantization was designed for classification problems 
that have existing data sets that can be used to supervise the learning 
by the system. The algorithm does not support regression problems. 

• LVQ is non-parametric, meaning that it does not rely on assumptions 
about that structure of the function that it is approximating. 
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• Real- values in input vectors should be normalized such that x G [0, 1). 

• Euclidean distance is commonly used to measure the distance between 
real-valued vectors, although other distance measures may be used 
(such as dot product), and data specific distance measures may be 
required for non-scalar attributes. 

• There should be sufficient training iterations to expose all the training 
data to the model multiple times. 

• The learning rate is typically linearly decayed over the training period 
from an initial value to close to zero. 

• The more complex the class distribution, the more codebook vectors 
that will be required, some problems may need thousands. 

• Multiple passes of the LVQ training algorithm are suggested for more 
robust usage, where the first pass has a large learning rate to prepare 
the codebook vectors and the second pass has a low learning rate and 
runs for a long time (perhaps 10-times more iterations). 

8.5.6 Code Listing 

Listing 8.4 provides an example of the Learning Vector Quantization algo- 
rithm implemented in the Ruby Programming Language. The problem is a 
contrived classification problem in a 2-dimensional domain x G [0, 1],?/ G 
[0,1] with two classes: 'A' {x G [0, 0.4999999], ^/ G [0,0.4999999]) and 'B' 
(xG [0.5, 1],?/G [0.5,1]). 

The algorithm was implemented using the LVQl variant where the best 
matching codebook vector is located and moved toward the input vector if 
it is the same class, or away if the classes differ. A linear decay was used for 
the learning rate that was updated after each pattern was exposed to the 
model. The implementation can easily be extended to the other variants of 
the method. 

def random_vector (minmax) 

return Array . newCminmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def generate_random_pattern (domain) 
classes = domain. keys 
selected_class = rand(classes . size) 
pattern = { : label=>classes [selected_class] }• 

pattern [ : vector] = random_vector (domain [classes [selected_class] ] ) 
return pattern 
end 

def initialize_vectors (domain, num_vectors) 
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classes = domain. keys 
codebook_vectors = [] 
num_ vectors . times do 

selected.class = rand(classes.size) 

codebook = {} 

codebook [: label] = classes [selected_class] 
codebook [: vector] = random_vector ( [ [0, 1] , [0, 1] ] ) 
codebook_ vectors « codebook 
end 

return codebook_vectors 
end 

def euclidean_distance(cl, c2) 
sum = 0.0 

cl.each.index {|i| sum += (cl [i] -c2 [i] ) **2. 0} 
return Math. sqrt (sum) 
end 

def get_best_matching_unit(codebook_vectors, pattern) 
best, b_dist = nil, nil 

codebook_vectors . each do | codebook | 

dist = euclidean_distaiice (codebook [: vector] , pattern[ : vector] ) 
best,b_dist = codebook, dist if b_dist.nil? or dist<b_dist 

end 

return best 
end 

def update_codebook_vector (bmu, pattern, Irate) 
bmu[: vector] . each_with_index do |v,i| 

error = pattern [: vector] [i] -bmu [: vector] [i] 
if bmu [: label] == pattern [: label] 

bmu [: vector] [i] += Irate * error 
else 

bmu [: vector] [i] -= Irate * error 
end 
end 
end 

def train_network(codebook_vectors, domain, iterations, learning_rate) 
iterations . times do |iter| 
pat = generate_random_pattern (domain) 
bmu = get_best_matching_unit (codebook_vectors, pat) 
Irate = learning_rate * (1. 0- (iter. to_f /iterations. to_f)) 
if iter .modulo (10) ==0 

puts "> iter=#{iter}, got=#{bmu[: label] >, exp=#{pat [: label]}" 
end 

update_codebook_ vector (bmu, pat. Irate) 
end 
end 

def test_network(codebook_vectors, domain, num_trials=100) 

correct = 0 

num_trials. times do 

pattern = generate_randoin_pattern (domain) 

bmu = get_best_matching_unit (codebook_vectors, pattern) 

correct += 1 if bmu [: label] == pattern [: label] 
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end 

puts "Done. Score: #{correct]-/#{iium_trials}" 
return correct 
end 

def execute (domain, iterations, num_vectors, learning_rate) 
codebook_vectors = initialize_vectors(domain, nuin_vectors) 
train_network(codebook_vectors , domain, iterations, learning_rate) 
test_network(codebook_vectors , domain) 
return codebook_vectors 

end 

if __FILE__ == $0 

# problem configuration 

domain = -["A"=> [ [0, 0 . 4999999] , [0,0.4999999]] , "B"=> [ [0 . 5 , 1] , [0.5,1]]> 

# algorithm configuration 
learning_rate = 0.3 
iterations = 1000 
num_vectors = 20 

# execute the algorithm 

execute (domain, iterations, num_vectors, learning_rate) 
end 

Listing 8.4: Learning Vector Quantization in Ruby 

8.5.7 References 

Primary Sources 

The Learning Vector Quantization algorithm was described by Kohonen in 
1988 [2], and was further described in the same year by Kohonen [1] and 
benchmarked by Kohonen, Barna, and Chrisley [5]. 

Learn More 

Kohonen provides a detailed overview of the state of LVQ algorithms and 
variants (LVQl, LVQ2, and LVQ2.1) [3]. The technical report that comes 
with the LVQ_PAK software (written by Kohonen and his students) provides 
both an excellent summary of the technique and its main variants, as well as 
summarizing the important considerations when applying the approach [6]. 
The seminal book on Learning Vector Quantization and the Self- Organizing 
Map is "Self-Organizing Maps" by Kohonen, which includes a chapter 
(Chapter 6) dedicated to LVQ and its variants [4]. 
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8.6 Self-Organizing Map 

Self- Organizing Map, SOM, Self-Organizing Feature Map, SOFM, Kohonen 
Map, Kohonen Network. 

8.6.1 Taxonomy 

The Self-Organizing Map algorithm belongs to the field of Artificial Neural 
Networks and Neural Computation. More broadly it belongs to the field of 
Computational Intelligence. The Self-Organizing Map is an unsupervised 
neural network that uses a competitive (winner-take-all) learning strategy. 
It is related to other unsupervised neural networks such as the Adaptive 
Resonance Theory (ART) method. It is related to other competitive learning 
neural networks such as the the Neural Gas Algorithm, and the Learning 
Vector Quantization algorithm (Section 8.5), which is a similar algorithm for 
classification without connections between the neurons. Additionally, SOM 
is a baseline technique that has inspired many variations and extensions, 
not limited to the Adaptive-Subspace Self-Organizing Map (ASSOM). 

8.6.2 Inspiration 

The Self- Organizing Map is inspired by postulated feature maps of neurons in 
the brain comprised of feature-sensitive cells that provide ordered projections 
between neuronal layers, such as those that may exist in the retina and 
cochlea. For example, there are acoustic feature maps that respond to 
sounds to which an animal is most frequently exposed, and tonotopic maps 
that may be responsible for the order preservation of acoustic resonances. 

8.6.3 Strategy 

The information processing objective of the algorithm is to optimally place 
a topology (grid or lattice) of codebook or prototype vectors in the domain 
of the observed input data samples. An initially random pool of vectors is 
prepared which are then exposed to training samples. A winner-take- all 
strategy is employed where the most similar vector to a given input pattern 
is selected, then the selected vector and neighbors of the selected vector 
are updated to closer resemble the input pattern. The repetition of this 
process results in the distribution of codebook vectors in the input space 
which approximate the underlying distribution of samples from the test 
dataset. The result is the mapping of the topology of codebook vectors to 
the underlying structure in the input samples which may be summarized or 
visualized to reveal topologically preserved features from the input space in 
a low-dimensional projection. 
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8.6.4 Procedure 

The Self-Organizing map is comprised of a collection of codebook vectors 
connected together in a topological arrangement, typically a one dimensional 
line or a two dimensional grid. The codebook vectors themselves represent 
prototypes (points) within the domain, whereas the topological structure 
imposes an ordering between the vectors during the training process. The 
result is a low dimensional projection or approximation of the problem 
domain which may be visualized, or from which clusters may be extracted. 

Algorithm 8.6.1 provides a high-level pseudocode for preparing codebook 
vectors using the Self-Organizing Map method. Codebook vectors are 
initialized to small floating point values, or sampled from the domain. The 
Best Matching Unit (BMU) is the codebook vector from the pool that has 
the minimum distance to an input vector. A distance measure between 
input patterns must be defined. For real-valued vectors, this is commonly 
the Euclidean distance: 

n 

dist(x, c) = "y^^jxj — Cif" (8-11) 

i=\ 

where n is the number of attributes, x is the input vector and c is a 
given codebook vector. 

The neighbors of the BMU in the topological structure of the network 
are selected using a neighborhood size that is linearly decreased during 
the training of the network. The BMU and all selected neighbors are then 
adjusted toward the input vector using a learning rate that too is decreased 
linearly with the training cycles: 

Ci{t + 1) = learurateit) X {Ci{t) - Xi) (8.12) 

where Ci(t) is the i^^ attribute of a codebook vector at time t, learrirate 
is the current learning rate, an xi is the i^^ attribute of a input vector. 

The neighborhood is typically square (called bubble) where all neigh- 
borhood nodes are updated using the same learning rate for the iteration, 
or Gaussian where the learning rate is proportional to the neighborhood 
distance using a Gaussian distribution (neighbors further away from the 
BMU are updated less). 

8.6.5 Heuristics 

• The Self-Organizing Map was designed for unsupervised learning- 
problems such as feature extraction, visualization and clustering. Some 
extensions of the approach can label the prepared codebook vectors 
which can be used for classification. 

• SOM is non-parametric, meaning that it does not rely on assumptions 
about that structure of the function that it is approximating. 
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Algorithm 8.6.1: Pseudocode for the SOM. 



Input: InputPatterns, iterations max-, Ic-arTt^^^, neighborhood^JH^^, 

Gridujidth, Gridheight 
Output: CodebookVectors 

1 CodebookVectors InitiailizeCodehodk^ectors (.Grid^idth, 
Gridheight, InputPatterns); 

2 for i = 1 to iter aticms max do 

3 learn\g^^^ ^ CalculateLearningRate (i, learn^^^^')'^ 

4 neighbor hoodie -f- CalculateNeighborhoodSize (i, 

neighborhoodl^l^^) ; 

5 Patterni SelectlnputPattern(lnputPatterns) ; 

6 Bmuj ^ SelectBestMatchingUnit (Paiiernj, 
CodebookVectors) ; 

7 Neighborhood ^ Bmui] 

8 Neighborhood ^ SelectNeighbors (Brnw^, CodebookVectors, 
neighborhood\^^^) ; 

9 foreach Vector i £ Neighborhood do 

10 foreach Fector?**''*^"*^ G Vectori do 

11 Vect&rf^''^^^^ <r- Vectorf*'''^''*^ + learn\.^^^ x 
{Patternf^''^^'^*^ - Vectorf*''^^''*^) 

12 end 

13 end 

14 end 

15 return CodebookVectors; 



• Real- values in input vectors should be normalized such that x G [0, 1). 

• Euclidean distance is commonly used to measure the distance between 
real-valued vectors, although other distance measures may be used 
(such as dot product), and data specific distance measures may be 
required for non-scalar attributes. 

• There should be sufficient training iterations to expose all the training 
data to the model multiple times. 

• The more complex the class distribution, the more codebook vectors 
that will be required, some problems may need thousands. 

• Multiple passes of the SOAI training algorithm are suggested for more 
robust usage, where the first pass has a large learning rate to prepare 
the codebook vectors and the second pass has a low learning rate and 
runs for a long time (perhaps 10-times more iterations). 

• The SOM can be visualized by calculating a Unified Distance Matrix 
(U-Matrix) shows highlights the relationships between the nodes in 
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the chosen topology. A Principle Component Analysis (PCA) or 
Sammon's Mapping can be used to visualize just the nodes of the 
network without their inter-relationships. 

• A rectangular 2D grid topology is typically used for a SOM, although 
toroidal and sphere topologies can be used. Hexagonal grids have 
demonstrated better results on some problems and grids with higher 
dimensions have been investigated. 

• The neuron positions can be updated incrementally or in a batch model 
(each epoch of being exposed to all training samples). Batch-mode 
training is generally expected to result in a more stable network. 

• The learning rate and neighborhood size parameters typically decrease 
linearly with the training iterations, although non-linear functions 
may be used. 

8.6.6 Code Listing 

Listing 8.5 provides an example of the Self-Organizing Map algorithm 
implemented in the Ruby Programming Language. The problem is a feature 
detection problem, where the network is expected to learn a predefined 
shape based on being exposed to samples in the domain. The domain is 
two-dimensional x,y £ [0, 1], where a shape is pre-defined as a square in 
the middle of the domain x,y E [0.3,0.6]. The system is initialized to 
vectors within the domain although is only exposed to samples within the 
pre-defined shape during training. The expectation is that the system will 
model the shape based on the observed samples. 

The algorithm is an implementation of the basic Self-Organizing Map 
algorithm based on the description in Chapter 3 of the seminal book on 
the technique [5]. The implementation is configured with a 4 x 5 grid 
of nodes, the Euclidean distance measure is used to determine the BMU 
and neighbors, a Bubble neighborhood function is used. Error rates are 
presented to the console, and the codebook vectors themselves are described 
before and after training. The learning process is incremental rather than 
batch, for simplicity. 

An extension to this implementation would be to visualize the resulting" 
network structure in the domain - shrinking from a mesh that covers the 
whole domain, down to a mesh that only covers the pre-defined shape within 
the domain. 

def random_vector (minmax) 

return Array .newCminmax. size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def initialize_vectors (domain, width, height) 
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codebook_vectors = [] 
width. times do |x| 
height . times do |y| 
codebook = {} 

codebook[: vector] = reindom_ vector (domain) 

codebook [: coord] = [x,y] 
codebook_vectors << codebook 
end 
end 

return codebook_vectors 
end 



def euclidean_distaiice(cl, c2) 
sum = 0.0 

cl.each.index {|i| sum += (cl [i] -c2 [i] ) **2 . 0} 
return Hath. sqrt (sum) 
end 



def get_best_matching_miit(codebook_vectors, pattern) 
best, b_dist = nil, nil 

codebook_vectors . each do | codebook | 

dist = euclidean_distance(codebook[: vector] , pattern) 
best,b_dist = codebook, dist if b_dist.nil? or dist<b_dist 

end 

return [best, b_dist] 
end 



def get_vectors_in_neighborhood(bmu, codebook_ vectors, neigh.size) 
neighborhood = [] 
codebook. vectors . each do | other | 

if euclidean.distance (bmu[: coord] , other [: coord] ) <= neigh_size 

neighborhood « other 
end 
end 

return neighborhood 
end 



def update_codebook_ vector (codebook, pattern, Irate) 

codebook[: vector] . each_with_index do |v,i| 
error = pattern[i] -codebook[:vector] [i] 
codebook[: vector] [i] += Irate * error 
end 
end 



def train_network( vectors, shape, iterations, l_rate, neighborhood.size) 
iterations. times do I iter I 

pattern = randoin_vector (shape) 

Irate = l_rate * (1 . 0- (iter . to_f /iterations . to_f) ) 

neigh_size = neighborhood_size * (1. 0- (iter. to_f /iterations. to_f)) 

bmu,dist = get_best_matching_unit (vectors, pattern) 

neighbors = get_vectors_in_neighborhood(bmu, vectors, neigh_size) 

neighbors . each do |node| 

update_codebook_ vector (node, pattern. Irate) 
end 

puts ">training: neighbors=#{neighbors . size} , bmu_dist=#{dist}" 
end 
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end 

def suiiimarize_vectors (vectors) 

mimnax = Array . new (vectors, first [: vector] . size) { [1 , 0] )• 
vectors . each do I c I 

c [: vector] . each_with_index do |v,i| 
miimax[i] [0] = v if v<iiiirmiax [i] [0] 
minmeixCi] [1] = v if v>minmax[i] [1] 
end 
end 
s = "" 

minmax. each_with_index { I bounds , i I s « "#{i}=#-Cbounds. inspect} "} 
puts "Vector details: #{s}" 
return minmax 
end 

def test_network(codebook_ vectors, shape, num_trials=100) 
error = 0.0 
num_trials. times do 

pattern = random. vector (shape) 

bmUjdist = get_best_matching_iinit (codebook_ vectors, pattern) 
error += dist 
end 

error /= num_trials.to_f 
puts "Finished, average error=#-[error]- " 
return error 
end 

def execute (domain, shape, iterations, l_rate, neigh_size, width, height) 
vectors = initialize_vectors(domain, width, height) 
siunmarize_vectors (vectors) 

train_network(vectors , shape, iterations, l_rate, neigh.size) 
test_network(vectors, shape) 
summarize.vectors (vectors) 
return vectors 
end 

if __FILE__ == $0 

# problem configuration 
domain = [ [0 . 0 , 1 . 0] , [0 . 0 , 1 . 0] ] 
shape = [[0.3,0.6] , [0.3,0.6]] 

# algorithm configuration 
iterations = 100 

l_rate = 0.3 
neigh_size = 5 
width, height =4, 5 

# execute the algorithm 

execute (domain, shape, iterations, l_rate, neigh_size, width, height) 
end 



Listing 8.5: Self-Organizing Map in Ruby 
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8.6.7 References 

Primary Sources 

The Self-Organizing Map was proposed by Kohonen in 1982 in a study 
that included the mathematical basis for the approach, summary of related 
physiology, and simulation on demonstration problem domains using one 
and two dimensional topological structures [3]. This work was tightly related 
two other papers published at close to the same time on topological maps 
and self-organization [1, 2]. 

Learn More 

Kohonen provides a detailed introduction and summary of the Self-Organizing 
Map in a journal article [4]. Kohonen et al. provide a practical presentation 
of the algorithm and heuristics for configuration in the technical report 
written to accompany the released SOM-PAK implementation of the al- 
gorithm for academic research [6]. The seminal book on the technique is 
"Self-Organizing Maps" by Kohonen, which includes chapters dedicated to 
the description of the basic approach, physiological interpretations of the 
algorithm, variations, and summaries of application areas [5]. 
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Chapter 9 

Advanced Topics 



This chapter discusses a number of advanced topics that may be considered 
once one or more of the algorithms described in this book have been 
mastered. 

The topics in this section consider some practical concerns such as: 

• How to implement an algorithm using a different programming paradigm 
(Section 9.1). 

• How to devise and investigate a new biologically-inspired algorithm 
(Section 9.2). 

• How to test algorithm implementations to ensure they are implemented 
correctly (Section 9.3). 

• How to visualize problems, algorithm behavior and candidate solutions 
(Section 9.4). 

• How to direct these algorithms toward practical problem solving" 
(Section 9.5). 

• Issues to consider when benchmarking and comparing the capabilities 
of algorithms (Section 9.6). 

The objective of this chapter is to illustrate the concerns and skills 
necessary for taking the algorithms described in this book into the real- 
world. 
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9.1 Programming Paradigms 

This section discusses three standard programming paradigms that may be 
used to implement the algorithms described throughput the book: 

• Procedural Programming (Section 9.1.1) 

• Object-Oriented Programming (Section 9.1.2) 

• Flow Programming (Section 9.1.3) 

Each paradigm is described and an example implementation is provided 
using the Genetic Algorithm (described in Section 3.2) as a context. 

9.1.1 Procedural Programming 

This section considers the implementation of algorithms from the Clever 
Algorithms project in the Procedural Programming Paradigm. 

Description 

The procedural programming paradigm (also called imperative program- 
ming) is concerned with defining a linear procedure or sequence of pro- 
gramming statements. A key feature of the paradigm is the partitioning 
of functionality into small discrete re-usable modules called procedures 
(subroutines or functions) that act like small programs themselves with their 
own scope, inputs and outputs. A procedural code example is executed 
from a single point of control or entry point which calls out into declared 
procedures, which in turn may call other procedures. 

Procedural programming was an early so-called 'high-level programming 
paradigm' (compared to lower-level machine code) and is the most common 
and well understood form of programming. Newer paradigms (such as 
Object-Oriented programming) and modern businesses programming lan- 
guages (such as C++, Java and C^) are built on the principles of procedural 
programming. 

All algorithms in this book were implemented using a procedural pro- 
gramming paradigm in the Ruby Programming Language. A procedural 
representation was chosen to provide the most transferrable instantiation 
of the algorithm implementations. Many languages support the procedural 
paradigm and procedural code examples are expected to be easily ported to 
popular paradigms such as object-oriented and functional. 

Example 

Listing 3.1 in Section 3.2 provides an example of the Genetic Algorithm 
implemented in the Ruby Programming Language using the procedural 
programming paradigm. 
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9.1.2 Object-Oriented Programming 

This section considers the implementation of algorithms from the Clever 
Algorithms project in the Object-Oriented Programming Paradigm. 

Description 

The Object-Oriented Programming (OOP) paradigm is concerned with 
modeling problems in terms of entities called objects that have attributes 
and behaviors (data and methods) and interact with other entities using 
message passing (calling methods on other entities). An object developer 
defines a class or template for the entity, which is instantiated or constructed 
and then may be used in the program. 

Objects can extend other objects, inheriting some or all of the attributes 
and behaviors from the parent providing specific modular reuse. Objects can 
be treated as a parent type (an object in its inheritance tree) allowing the 
use or application of the objects in the program without the caller knowing 
the specifics of the behavior or data inside the object. This general property 
is called polymorphism, which exploits the encapsulation of attributes and 
behavior within objects and their capability of being treated (viewed or 
interacted with) as a parent type. 

Organizing functionality into objects allows for additional constructs 
such as abstract types where functionality is only partially defined and 
must be completed by descendant objects, overriding where descending- 
objects re-define behavior defined in a parent object, and static classes and 
behaviors where behavior is executed on the object template rather than 
the object instance. For more information on Object-Oriented programming 
and software design refer to a good textbook on the subject, such as Booch 
[1] or Meyer [3]. 

There are common ways of solving discrete problems using object- 
oriented programs called patterns. They are organizations of behavior 
and data that have been abstracted and presented as a solution or idiom 
for a class of problem. The Strategy Pattern is an object-oriented pattern 
that is suited to implementing an algorithm. This pattern is intended to 
encapsulate the behavior of an algorithm as a strategy object where differ- 
ent strategies can be used interchangeably on a given context or problem 
domain. This strategy can be useful in situations where the performance or 
capability of a range of different techniques needs to be assessed on a given 
problem (such as algorithm racing or bake-offs). Additionally, the problem 
or context can also be modeled as an interchangeable object, allowing both 
algorithms and problems to be used interchangeably. This method is used in 
object-oriented algorithm frameworks. For more information on the strategy 
pattern or object-oriented design patterns in general, refer to Gamma et al. 
[2]. 
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Example 

Listing 9.1 provides an example of the Genetic Algorithm implemented in 
the Ruby Programming Language using the Object-Oriented Programming 
Paradigm. 

The implementation provides general problem and strategy classes that 
define their behavioral expectations. A OneMax problem class and a Genetic- 
Algorithm strategy class are specified. The algorithm makes few assump- 
tions of the problem other than it can assess candidate solutions and 
determine whether a given solution is optimal. The problem makes very 
few assumptions about candidate solutions other than they are map data 
structures that contain a binary string and fitness key- value pairs. The use 
of the Strategy Pattern allows a new algorithm to easily be defined to work 
with the existing problem, and that new problems could be defined for the 
Genetic Algorithm to execute. 

Note that Ruby does not support abstract classes, so this construct 
is simulated by defining methods that raise an exception if they are not 
overridden by descendant classes. 

# A problem template 
class Problem 

def assess (candidate_solution) 

raise "A problem has not been defined" 
end 

def is_optimal? (candidate_solution) 

raise "A problem has not been defined" 
end 
end 

# An strategy template 
class Strategy 

def execute (problem) 

raise "A strategy has not been defined!" 
end 
end 

# An implementation of the OneMax problem using the problem template 
class OneMax < Problem 

attr_reader :num_bits 

def initialize (num_bits=64) 

@num_bits = num_bits 
end 

def assess (candidate_solution) 

if candidate_solution[:bitstring] . length != @num_bits 
raise "Expected #-[@num_bits}- in candidate solution. " 
end 

sum = 0 

candidate_solution[ : bitstring] . size . times do |i| 

sum += 1 if candidate_solution [: bitstring] [i] . chr =='1' 
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end 

return sum 
end 

def is_optimal? (candidate.solutlon) 

return candidate_solutionC:f itness] == @num_bits 
end 
end 

# An implementation of the Genetic algorithm using the strategy template 
class GeneticAlgorithm < Strategy 

attr_reader :max_generations, :populatioii_size, :p_crossover , :p_mutation 

def initialize (max_gens=100, pop_size=100, crossover=0. 98, 
mutation=l. 0/64.0) 

@max_generations = max_gens 
@population_size = pop_size 
@p_crossover = crossover 
@p_mutation = mutation 
end 

def random_bitstring(num_bits) 

return (0. . .num.bits) .inject(""){|s,i| s«((rand<0.5) ? "1" : "0")} 
end 

def bin8^:y_tournament (pop) 

i, j = rand (pop. size) , rand (pop . size) 
j = rand(pop. size) while j==i 

return (pop[i] [: fitness] > pop [j] [:f itness] ) ? pop[i] : pop[j] 
end 

def point _mutation(bitstring) 
child = "" 

bitstring. size . times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<@p_niutation) ? ((bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def uniform. crossover (parentl, parent2) 
return ""+parenti if rand()>=Qp_crossover 

child = "" 

parentl. length. times do |i| 

child « ((rand()<0.5) ? parentl [i] . chr : pEu:ent2 [i] . chr) 
end 

return child 
end 

def reproduce (selected) 
children = [] 

selected. each_with_index do |pl, i| 

p2 = (i. modulo (2)==0) ? selected [i+1] : selected[i-l] 
p2 = selected[0] if i == selected. size-1 
child = {} 
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child [:bitstring] = unif orin_crossover (pi [:bitstring] , p2[:bitstring]) 
child [: bitstring] = point _mutation(child[:bitstring] ) 
children << child 

break if children. size >= @population_size 
end 

return children 
end 

def execute (problem) 

population = Array . new(@population_size) do |i| 

{ : bit St ring=>randoni_bit string (problem . nuni_bits) } 
end 

population. each{ I c I c[:fitness] = problem. assess (c)} 

best = population. sort{ I x,y I y[:fitness] <=> x[:fitness]}. first 

@max_generations. times do I gen I 

selected = Array .new(population_size){ | i | 

binary_tournament (population)} 
children = reproduce (selected) 

children. each-C I c I c[: fitness] = problem. assess (c)} 
children. sort !{ I x,y I y[:fitness] <=> x[:fitness]} 

best = children. first if children.f irst [:f itness] >= best [: fitness] 
population = children 

puts " > gen #-[gen}, best: tf-Cbest [: fitness] }, #{best [: bitstring] }" 
break if problem. is_optimal? (best) 
end 

return best 
end 
end 

if __FILE__ == $0 

# problem configuration 

problem = OneMax.new 

# algorithm configuration 
strategy = GeneticAlgorithm.new 

# execute the algorithm 

best = strategy . execute (problem) 

puts "done! Solution: f =#-[best [: fitness] } , s=#{best [: bitstring] }" 
end 

Listing 9.1: Genetic Algorithm in Ruby using OOP 



9.1.3 Flow Programming 

This section considers the implementation of algorithms from the Clever 
Algorithms project in the Flow Programming paradigm. 

Description 

Flow, data-flow, or pipeline programming involves chaining a sequence of 
smaller processes together and allowing a flow of information through the 
sequence in order to perform the desired computation. Units in the flow are 
considered black-boxes that communicate with each other using message 
passing. The information that is passed between the units is considered a 
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stream and a given application may have one or more streams of potentially 
varying direction. Discrete information in a stream is partitioned into 
information packets which are passed from unit-to-unit via message buffers, 
queues or similar data structures. 

A flow organization allows computing units to be interchanged readily. 
It also allows for variations of the pipeline to be considered with minor 
reconfiguration. A flow or pipelining structure is commonly used by software 
frameworks for the organization within a given algorithm implementation, 
allowing the specification of operators that manipulate candidate solutions 
to be varied and interchanged. 

For more information on Flow Programming see a good textbook on the 
subject, such as Morrison [4]. 

Example 

Listing 9.2 provides an example of the Genetic Algorithm implemented in 
the Ruby Programming Language using the Flow Programming paradigm. 
Each unit is implemented as an object that executes its logic within a 
standalone thread that reads input from the input queue and writes data 
to its output queue. The implementation shows four flow units organized 
into a cyclic graph where the output queue of one unit is used as the input 
of the next unit in the cycle (EvalFlowUnit to StopConditionUnit to 
SelectFlowUnit to VariationFlowUnit). 

Candidate solutions are the unit of data that is passed around in the flow 
between units. When the system is started it does not have any information 
to process until a set of random solutions are injected into the evaluation 
unit's input queue. The solution are evaluated and sent to the stop condition 
unit where the constraints of the algorithm execution are tested (optima 
found or maximum number of evaluations) and the candidates are passed on 
to the selection flow unit. The selection unit collects a predefined number 
of candidate solutions then passes the better solutions onto the variation 
unit. The variation unit performs crossover and mutation on each pair of 
candidate solutions and sends the results to the evaluation unit, completing 
the cycle. 

require 'thread' 

# Generic flow unit 
class FlowUnit 

attr_reader :queue_in, :queue_out, : thread 

def initialize (q_in=Queue . new, q_out=queue . new) 

@queue_in, @queue_out = q_in, q_out 

start 0 
end 

def execute 

raise "FlowUnit not defined!" 
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end 

def start 

puts "Starting flow unit: #-[self . class, name} ! " 
Sthread = Thread. new do 

execute 0 vhile true 
end 
end 
end 

# Evaluation of solutions flow unit 
class EveilFlovUnit < FlowUnit 

def onem2Lx(bitstring) 
sum = 0 

bitstring. size, times sum+=l if bitstring [i] . chr==' 1 '} 

return sum 
end 

def execute 

data = @queue_in.pop 

data[ : fitness] = onemax (data [: bitstring] ) 
@queue_out .push (data) 
end 
end 

# Stop condition flow unit 

class StopConditionUnit < FlowUnit 

attr .reader :best, :num_bits, :mEix_ evaluations, :evals 

def initialize (q_in=Queue. new, q_out=Queue.new, max_evaluations=10000, 
num_bits=64) 

Obest, ©evals = nil, 0 

@num_bits = num_bits 

®max_ evaluations = max.eval nations 

super(q_in, q_out) 
end 

def execute 

data = @queue_in. pop 

if Sbest.nil? or data [: fitness] > Sbest [: fitness] 
Qbest = data 

puts " >new best: #-C(Dbest[: fitness]}, #{Qbest [: bitstring]}" 
end 

©evals += 1 

if @best [: fitness] ==@num_bits or ®evals>=Qmax_evaluations 
puts "done! Solution: f=#{Qbest [: fitness] }, s=#{Qbest[: bitstring]}" 
©thread. exit () 

end 

Qqueue.out . push(data) 
end 
end 

# Fitness-based selection flow unit 
class SelectFlowUnit < FlowUnit 

def initialize (q_in=Queue .new, q_out=Queue.new, pop_size=100) 
<9pop_size = pop_size 
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super (q_ in, q_out) 
end 

def binary_tournament (pop) 

i, j = rand (pop. size) , rand (pop. size) 

j = randCpop . size) while j==i 

return (pop [i] [: fitness] > pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def execute 

population = Array. new 

population « Qqueue.in.pop vhile population. size < @pop_size 
Qpop_size . times do |i| 

®queue_out . push(binary_tournament (population) ) 
end 
end 
end 

# Variation flow unit 

class VariationFlowUnit < FlowUnit 

def initialize (q_in=Queue. new, q_out=Queue.new, crossover=0.98, 
mutation=l . 0/64 . 0) 
Qp_crossover = crossover 
Qp.mutation = mutation 
super (q_in, q_out) 
end 

def unif orm_crossover (parentl , parent2) 
return ""+parentl if rand()>=®p_crossover 
child = "" 

parentl. length. times do |i| 

child « ((rand()<0.5) ? parentl [i] . chr : parent2 [i] . chr) 
end 

return child 
end 

def point_mutation(bitstring) 
child = "" 

bitstring. size . times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<@p_mutation) ? ((bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def reproduce (pi, p2) 
child = {} 

child [: bitstring] = unif orm_crossover (pi [: bitstring] , p2 [: bitstring] ) 
child [: bitstring] = point_mutation(child [: bitstring] ) 
return child 
end 

def execute 

parentl = @queue_in. pop 
parent 2 = @queue_in. pop 

@queue_out . push (reproduce (parent 1 , parent 2) ) 
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@queue_out . push (reproduce (parent2 , parentl) ) 
end 
end 

def random_bitstring(num_bits) 

return (0 ... num_bits) . inject (""){ I s , i | s« ( (rand<0 . 5) ? "1" : "0")} 
end 

def search(population_size=100 , nuiii_bits=64) 

# create the pipeline 
eval = EvalFlowUnit . new 

stopcondition = StopConditionUnit . new(eval . queue_out) 

selection = SelectFlowUnit .new(stopcondition. queue_out) 

variation = VariationFlowUnit . new (select ion . queue_out , eval . queue_in) 

# push random solutions into the pipeline 
population_size . times do 

solution = -[ : bitstring=>randoiii_bitstring (nuiii_bits) }■ 
eval . queue _ in . push (solution) 
end 

stopcondition. thread. join 
return stopcondition. best 
end 

if __FILE__ == $0 

best = searchO 

puts "done! Solution: f =#-[best [ : f itness] }■ , s=#-Cbest [ : bitstring] }•" 
end 

Listing 9.2: Genetic Algorithm in Ruby using the Flow Programming 



9.1.4 Other Paradigms 

A number of popular and common programming paradigms have been 
considered in this section, although many more have not been described. 

Many programming paradigms are not appropriate for implementing 
algorithms as-is, but may be useful with the algorithm as a component in a 
broader system, such as Agent-Oriented Programming where the algorithm 
may be a procedure available to the agent. Meta-programming a case where 
the capabilities of the paradigm may be used for parts of an algorithm 
implementation, such as the manipulation of candidate programs in Genetic 
Programming (Section 3.3). Aspect- Oriented Programming could be layered 
over an object oriented algorithm implementation and used to separate the 
concerns of termination conditions and best solution logging. 

Other programming paradigms provide variations on what has already 
been described, such as Functional Programming which would be similar to 
the procedural example, and Event-Driven Programming that would not 
be too dissimilar in principle to the Flow-Based Programming. Another 
example is the popular idiom of Map-Reduce which is an application of 
functional programming principles organized into a data flow model. 
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Finally, there are programming paradigms that are not relevant or 
feasible to consider implementing algorithms, such as Logic Programming. 
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9.2 Devising New Algorithms 

This section provides a discussion of some of the approaches that may be 
used to devise new algorithms and systems inspired by biological systems for 
addressing mathematical and engineering problems. This discussion covers: 

• An introduction to adaptive systems and complex adaptive systems as 
an approach for studying natural phenomenon and deducing adaptive 
strategies that may be the basis for algorithms (Section 9.2.1). 

• An introduction to some frameworks and methodologies for reducing 
natural systems into abstract information processing procedures and 
ultimately algorithms (Section 9.2.2). 

• A summary of a methodology that may be used to investigate a devised 
adaptive system that considers the trade-off in model fidelity and 
descriptive power proposed by Goldberg, a pioneer in the Evolutionary 
Computation field (Section 9.2.3). 

9.2.1 Adaptive Systems 

Many algorithms, such as the Genetic Algorithm have come from the study 
and models of complex and adaptive systems. Adaptive systems research 
provides a methodology by which these systems can be systematically 
investigated resulting in adaptive plans or strategies that can provide the 
basis for new and interesting algorithms. 

Holland proposed a formalism in his seminal work on adaptive systems 
that provides a general manner in which to define an adaptive system [7]. 
Phrasing systems in this way provides a framework under which adaptive 
systems may be evaluated and compared relative to each other, the diffi- 
culties and obstacles of investigating specific adaptive systems are exposed, 
and the abstracted principles of different system types may be distilled. 
This section provides a summary of the Holland's seminal adaptive systems 
formalism and considers clonal selection as an example of an adaptive plan. 

Adaptive Systems Formalism 

This section presents a brief review of Holland's adaptive systems formalism 
described in [7] (Chapter 2). This presentation focuses particularly on 
the terms and their description, and has been hybridized with the concise 
presentation of the formalism by De Jong [9] (page 6). The formalism is 
divided into sections: 1) Primary Objects summarized in Table 9.1, and 
2) Secondary Objects summarized in Table 9.2. Primary Objects are the 
conventional objects of an adaptive system: the environment e, the strategy 
or adaptive plan that creates solutions in the environment s, and the utility 
assigned to created solutions U. 
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Term 


Object 


Description 


e 


Environment 


The environment of the system undergoing adapta- 
tion. 


s 


Strategy 


The adaptive plan which determines successive struc- 
tural modifications in response to the environment. 


U 


Utility 


A measure of performance or payoff of different struc- 
tures in the environment. Maps a given solution (^4) 
to a real number evaluation. 



Table 9.1: Primary Objects in the adaptive systems formalism. 



Secondary Objects extend beyond the primary objects providing the 
detail of the formalism. These objects suggest a broader context than 
that of the instance specific primary objects, permitting the evaluation and 
comparison of sets of objects such as plans (5), environments (E), search 
spaces (A), and operators (O). 

A given adaptive plan acts in discrete time t, which is a useful simpli- 
fication for analysis and computer simulation. A framework for a given 
adaptive system requires the definition of a set of strategies S, a set of 
environments E, and criterion for ranking strategies X . A given adaptive 
plan is specified within this framework given the following set of objects: a 
search space A, a set of operators O, and feedback from the environment /. 
Holland proposed a series of fundamental questions when considering the 
definition for an adaptive system, which he rephrases within the context of 
the formalism (see Table 9.3). 

Some Examples 

Holland provides a series of illustrations rephrasing common adaptive sys- 
tems in the context of the formalism [7] (pages 35-36). Examples include: 
genetics, economics, game playing, pattern recognition, control, function 
optimization, and the central nervous system. The formalism is applied to 
investigate his schemata theorem, reproductive plans, and genetic plans. 
These foundational models became the field of Evolutionary Computation 
(Chapter 3). 

From working within the formalism, Holland makes six observations 
regarding obstacles that may be encountered whilst investigating adaptive 
systems [7] (pages 159-160): 

• High cardinality of A: makes searches long and storage of relevant 
data difficult. 

• Appropriateness of credit: knowledge of the properties about 'success- 
ful' structures is incomplete, making it hard to predict good future 
structures from past structures. 
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Term 


Object 


Description 


A 


Search Space 


The set of attainable structm'es, solutions, and the 
domain of action for an adaptive plan. 


E 


Environments 


The range of different environments, where e is an 
instance. It may also represent the unknowns of the 
strategy about the environment. 


O 


Operators 


Set of operators applied to an instance of A at time 
t (At) to transform it into At+i. 


s 


Strategies 


Set of plans applicable for a given environment 
(where s is an instance), that use operators from 
the set O. 


X 


Criterion 


Used to compare strategies (in the set S), under 
the set of environments (E). Takes into account the 
efficiency of a plan in different environments. 


I 


Feedback 


Set of possible environmental inputs and signals pro- 
viding dynamic information to the system about the 
performance of a particular solution A in a particular 
environment E. 


M 


Memory 


The memory or retained parts of the input history 
(/) for a solution (A). 



Table 9.2: Secondary Objects in the adaptive systems formalism. 



• High dimensionality of U on an e: performance is a function of a 
large number of variables which is difficult for classical optimization 
methods. 

• Non-linearity ofU on an e: many false optima or false peaks, resulting 
in the potential for a lot of wasted computation. 

• Mutual interference of search and exploitation: the exploration (ac- 
quisition of new information), exploitation (application of known 
information) trade-off. 

• Relevant non-payoff information: the environment may provide a lot 
more information in addition to payoff, some of which may be relevant 
to improved performance. 

Cavicchio provides perhaps one of the first applications of the formalism 
(after Holland) in his dissertation investigating Holland's reproductive plans 
[10] (and to a lesser extent in [11]). The work summarizes the formalism, 
presenting essentially the same framework, although he provides a special- 
ization of the search space A. The search space is broken down into a 
representation (codes), solutions (devices), and a mapping function from 
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Question 


Formal 


To what parts of its environment is the organism (system, 
organization) adapting? 


What is E? 


How does the environment act upon the adapting organism 
(system, organization)? 


What is /? 


What structures are undergoing adaptation? 


What is A? 


What are the mechanisms of adaptation? 


What is 0? 


What part of the history of its interaction with the environment 
does the organism (system, organization) retain in addition to 
that summarized in the structure tested? 


What is A/? 


What hmits are there to the adaptive process? 


What is S? 


How are different (hypotheses about) adaptive processes to be 
compared? 


What is X? 



Table 9.3: Questions when investigating adaptive systems, taken from [7] 
(pg. 29). 



representation to solutions. The variation highlights the restriction the 
representation and mapping have on the designs available to the adaptive 
plan. Further, such mappings may not be one-to-one, there may be many 
instances in the representation space that map to the same solution (or the 
reverse) . 

Although not explicitly defined, Holland's specification of structures A is 
clear in pointing out that the structures are not bound to a level of abstrac- 
tion; the definition covers structures at all levels. Nevertheless, Cavicchio's 
specialization for a representation-solution mapping was demonstrated to be 
useful in his exploration of reproductive plans (early Genetic Algorithms) . 
He proposed that an adaptive system is first order if the utility function U 
for structures on an environment encompasses feedback /. 

Cavicchio described the potential independence (component-wise) and 
linearity of the utility function with respect to the representation used. 
De Jong also employed the formalism to investigate reproductive plans in 
his dissertation research [9]. He indicated that the formalism covers the 
essential characteristics of adaptation, where the performance of a solution 
is a function of its characteristics and its environment. Adaptation is defined 
as a strategy for generating better-performing solutions to a problem by 
reducing initial uncertainty about the environment via feedback from the 
evaluation of individual solutions. De Jong used the formalism to define a 
series of genetic reproductive plans, which he investigated in the context of 
function optimization. 
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Complex Adaptive Systems 

Adaptive strategies are typically complex because they result in irreducible 
emergent behaviors that occur as a result of the non-linear interactions of 
system components. The study of Complex Adaptive Systems (CAS) is 
the study of high-level abstractions of natural and artificial systems that 
are generally impervious to traditional analysis techniques. Macroscopic 
patterns emerge from the dynamic and non-linear interactions of the sys- 
tem's low-level (microscopic) adaptive agents. The emergent patterns are 
more than the sum of their parts. As such, traditional reductionist method- 
ologies fail to describe how the macroscopic patterns emerge. Holistic and 
totalistic investigatory approaches are applied that relate the simple rules 
and interactions of the simple adaptive agents to their emergent effects in a 
'bottom- up' manner. 

Some relevant examples of CAS include: the development of embryos, 
ecologies, genetic evolution, thinking and learning in the brain, weather 
systems, social systems, insect swarms, bacteria becoming resistant to an 
antibiotic, and the function of the adaptive immune system. 

The field of CAS was founded at the Santa Fe Institute (SFI), in the 
late 1980s by a group of physicists, economists, and others interested in 
the study of complex systems in which the agents of those systems change 
[1]. One of the most significant contributors to the inception of the field 
from the perspective of adaptation was Holland. He was interested in the 
question of how computers could be programmed so that problem-solving 
capabilities are built up by specifying: ^^what is to be done" (inductive 
information processing) rather than "/iow; to do it" (deductive information 
processing). In the 1992 reprint of his book he provided a summary of CAS 
with a computational example called ECHO [7]. His work on CAS was 
expanded in a later book which provided an in depth study of the topic [8]. 

There is no clear definition of a Complex Adaptive System, rather sets 
of parsimonious principles and properties, many different researches in the 
field defining their own nomenclature. Popular definitions beyond Holland's 
work include that of Gell-Mann [4] and Arthur [2]. 

9.2.2 Biologically Inspired Algorithms 

Explicit methodologies have been devised and used for investigating natural 
systems with the intent of devising new computational intelligence tech- 
niques. This section introduces two such methodologies taken from the field 
of Artificial Immune Systems (Chapter 7). 

Conceptual Framework 

Although the progression from an inspiring biological system to an inspired 
computation system may appear to be an intuitive process, it can involve 
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problems of standardization of nomenclature, effective abstraction and 
departure from biology, and rigor. Stepney, et al. caution that by following 
a process that lacks the detail of modeling, one may fall into the trap of 
reasoning by metaphor [12-14]. 

Besides the lack of rigor, the trap suggests that such reasoning and lack 
of objective analysis limits and biases the suitability and applicability of 
resultant algorithms. They propose that many algorithms in the field of 
Artificial hximune Systems (and beyond) have succumbed to this trap. This 
observation resulted in the development and application of a conceptual 
framework to provide a general process that may be applied in the field 
of Biological hispired Computation toward realizing Biological Inspired 
Computational Intelligence systems. 

The conceptual framework is comprised of the following actors and steps: 

1. Biological System: The driving motivation for the work that possesses 
some innate information processing qualities. 

2. Probes: Observations and experiments that provide a partial or noisy 
perspective of the biological system. 

3. Models: From probes, abstract and simplified models of the informa- 
tion processing qualities of the system are built and validated. 

4. Framework: Built and validated analytical computational frameworks. 
Validation may use mathematical analysis, benchmark problems, and 
engineering demonstration. 

5. Algorithms: The framework provides the principles for designing" and 
analyzing algorithms that may be general and applicable to domains 
unrelated to the biological motivation. 

Immunology as Information Processing 

Forrest and Hofmeyr summarized their AIS research efforts at the University 
of New Mexico and the Santa Fe Institute as ''immunology as information 
processing" [3]. They define information as spatio-temporal patterns that 
can be abstracted and described independent of the biological system and 
information processing as computation with these patterns. They proposed 
that such patterns are encoded in the proteins and other molecules of the 
immune system, and that they govern the behavior of the biological system. 
They suggest that their information processing perspective can be con- 
trasted with the conventional structural perspective of cellular interactions 
as mechanical devices. They consider a simple four-step procedure for the 
investigation of immunology as information processing, transitioning from 
the biological system to a usable computational tool: 

1. Identify a specific mechanism that appears to be interesting computa- 
tionally. 
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2. Write a computer program that implements or models the mechanism. 

3. Study its properties through simulation and mathematical analysis. 

4. Demonstrate capabilities either by applying the model to a biological 
question of interest or by showing how it can be used profitably in a 
computer science setting. 

The procedure is similar to the outlined in the conceptual framework for 
Biologically Inspired Algorithms in that in addition to identifying biological 
mechanisms (input) and demonstrating a resultant algorithms (output), 
the procedure 1) highlights the need for abstraction involving modeling the 
identified mechanism, and 2) highlights the need to analyze the models 
and abstractions. The procedure of Forrest and Hofmeyr can be used to 
specialize the conceptual framework of Stepney et al. by clearly specifying 
the immunological information processing focus. 

9.2.3 Modeling a New Strategy 

Once an abstract information processing system is devised it must be investi- 
gated in a systematic manner. There are a range of modeling techniques for 
such a system from weak and rapid to realize to strong and slow to realize. 
This section considers the trade-off's in modeling an adaptive technique. 

Engineers and Mathematicians 

Goldberg describes the airplane and other products of engineering as ma- 
terial machines, and distinguishes them from the engineering of genetic 
algorithms and other adaptive systems as conceptual machines. He argues 
the methodological distinction between the two is counter-productive and 
harmful from the perspective of conceptual machines, specifically that the 
methodology of the material is equally applicable to that of the conceptual 

[5]. 

The obsession of mathematical rigor in computer science, although 
extremely valuable, is not effective in the investigation of adaptive systems 
given their complexity. Goldberg sites the airplane as an example where 
the engineering invention is used and trusted without a formal proof that 
the invention works (that an airplane can fiy).^ 

This defense leads to what Goldberg refers to the economy of design, 
which is demonstrated with a trade-off that distinguishes 'model description' 
(mathematician-scientists) that is concerned with model fidelity, and model 
prescription (engineer-inventor) that is concerned with a working product. 
In descriptive modeling the model is the thing whereas in 'prescriptive 
modeling', the object is the thing. In the latter, the model (and thus its 

^Goldberg is quick to point out that sets of equations do exist for various aspects of 
flight, although no integrated mathematical proof for airplane flight exists. 
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utility) serves the object, in the former model accuracy may be of primary 
concern. This economy of modeling provides a perspective that distinguishes 
the needs of the prescriptive and descriptive fields of investigation. 

The mathematician-scientist is interested in increasing model accuracy at 
the expense of speed (slow), whereas the engineer may require a marginally 
predictive (less accurate) model relatively quickly. This trade-off between 
high-cost high-accuracy models and low-cost low-fidelity models is what 
may be referred to as the modeling spectrum that assists in selecting an 
appropriate level of modeling. Goldberg proposes that the field of Genetic 
Algorithms expend too much effort at either ends of this spectrum. There 
is much work where there is an obsession with blind-prototyping many 
different tweaks in the hope of striking it lucky with the right mechanism, 
operator, or parameter. Alternatively, there is also an obsession with 
detailed mathematical models such as differential equations and Markov 
chains. The middle ground of the spectrum, what Goldberg refers to as little 
models is a valuable economic modeling consideration for the investigation 
of conceptual machines to "do good science through good engineering^^ . 

Methodology 

The methodology has been referred to as post-modern systems engineering 
and is referred to by Goldberg as a methodology of innovation [6]. The core 
principles of the process are as follows: 

1. Decomposition: Decompose the large problem approximately and 
intuitively, breaking into quasi-separate sub-problems (as separate as 
possible). 

2. Modeling: Investigate each sub-problem separately (or as separate as 
possible) using empirical testing coupled with adequately predictive, 
low- cost models. 

3. Integration: Assemble the sub-solutions and test the overall invention, 
paying attention to unforeseen interactions between the sub-problems. 

Decomposition Problem decomposition and decomposition design is an 
axiom of reductionism and is at the very heart of problem solving in com- 
puter science. In the context of adaptive systems, one may consider the base 
or medium on which the system is performing its computation mechanisms 
the so-called building blocks of information processing. A structural decom- 
position may involve the architecture and data structures of the system. 
Additionally, one may also consider a functional breakdown of mechanisms 
such as the operators applied at each discrete step of an algorithmic process. 
The reductions achieved provide the basis of investigation and modeling. 
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Small Models Given the principle of the economy of modeling presented 
as a spectrum, one may extend the description of each of the five presented 
model types. Small Models refers to the middle of the spectrum, specifically 
to the application of dimensional and facet- wise models. These are mid- 
range quantitative models that make accurate prediction over a limited 
range of states at moderate cost. Once derived, this class of models generally 
require a small amount of formal manipulation and large amounts of data 
for calibration and verification. The following summarizes the modeling 
spectrum: 

• Unarticulated Wisdom: (low-cost, high-error) Intuition, what is used 
when there is nothing else. 

• Articulated Qualitative Models: Descriptions of mechanisms, graphical 
representations of processes and/or relationships, empirical observation 
or statistical data collection and analysis. 

• Dimensional Models: Investigate dimensionless parameters of the 
system. 

• Facet-wise Models: hivestigation of a decomposition element of a 
model in relative isolation. 

• Equations of Motion: (high-cost, low-error) Differential equations and 
Markov chains. 

Facet-wise models are an exercise in simple mathematics that may 
be used to investigate a decomposition element of a model in relative 
isolation. They are based on the idea of bracketing high-order phenomena 
by simplifying or making assumptions about the state of the system. An 
example used by Goldberg from fluid mechanics is a series of equations 
that simplify the model by assuming that a fluid or gas has no viscosity, 
which matches no known substance. A common criticism of this modeling 
approach is ^''system X doesn't work like that, the model is unrealistic.^^ The 
som-ce of such concerns with adaptive systems is that their interactions are 
typically high-dimensional and non- linear. Goldberg's response is that for a 
given poorly understood area of research, any 'useful' model is better than 
no model. Dimensional analysis or the so-called dimensional reasoning and 
scaling laws are another common conceptual tool in engineering and the 
sciences. Such models may be used to investigate dimensionless parameters 
of the system, which may be considered the formalization of the systemic 
behaviors. 

Integration Integration is a unification process of combining the findings 
of various models together to form a patch-quilt coherent theory of the 
system. Integration is not limited to holistic unification, and one may 
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address specific hypothesis regarding the system resulting in conclusions 
about existing systems and design decisions pertaining to the next generation 
of systems. 

Application In addition to elucidating the methodology, Goldberg speci- 
fies a series of five useful heuristics for the application of the methodology 
(taken from [5], page 8): 

1. Keep the goal of a working conceptual machine in mind. Experimenters 
commonly get side tracked by experimental design and statistical 
verification; theoreticians get side tracked with notions of mathematical 
rigor and model fidelity. 

2. Decompose the design ruthlessly. One cannot address the analytical 
analysis of a system like a Genetic Algorithm in one big 'gulp'. 

3. Use facet-wise models with almost reckless abandon. One should build 
easy models that can be solved by bracketing everything that gets in 
the way. 

4. Integrate facet-wise models using dimensional arguments. One can 
combine many small models together in a patch-quilt manner and 
defend the results of such models using dimensional analysis. 

5. Build high-order models when small models become inadequate. Add 
complexity to models as complexity is needed (economy of modeling). 

9.2.4 Bibliography 

[1] P. W. Anderson, K. J. Arrow, and D. Pines. Proceedings of The Santa 
Fe Institute Studies in the Sciences of Complexity - Economy As an 
Evolving Complex System. Addison Wesley Publishing Company, USA, 
1988. 

[2] W. B. Arthur. Introduction: Process and emergence in the economy. 
In S. Durlauf and D. A. Lane, editors. The Economy as an Evolving 
Complex System II, volume Volume XXVII. Addison- Wesley Pub. Co, 
Reading, Mass, USA, 1997. 

[3] S. Forrest and S. A. Hofmeyr. Immunology as information processing. 
In Design Principles for the Immune System and Other Distributed 
Autonomous Systems, pages 361-388. Oxford University Press, New 
York, 2001. 

[4] M. Gell-Mann. Complex adaptive systems. In D. Pines and D. Meltzer, 
editors. Complexity: metaphors, models, and reality, pages 17-45. 
Addison- Wesley, USA, 1994. 



CiJ!. 



366 



Chapter 9. Advanced Topics 



[5] D. E. Goldberg. From genetic and evolutionary optimization to the 
design of conceptual machines. Evolutionary Optimization, 1(1): 1-12, 
1999. 

[6] D. E. Goldberg. The design of innovating machines: A hmdamental 
discipline for a postmodern systems engineering. In Engineering Systems 
Symposium. MIT Engineering Systems Division, USA, 2004. 

[7] J. H. Holland. Adaptation in natural and artificial systems: An in- 
troductory analysis with applications to biology, control, and artificial 
intelligence. University of Michigan Press, 1975. 

[8] J. H. Holland. Hidden Order: How Adaptation Builds Complexity. 
Addison Wesley Publishing Company, USA, 1995. 

[9] K. A. De Jong. An analysis of the behavior of a class of genetic adaptive 
systems. PhD thesis. University of Michigan Ann Arbor, MI, USA, 
1975. 

[10] D. J. Cavicchio Jr. Adaptive Search Using Simulated Evolution. PhD 
thesis, The University of Michigan, 1970. 

[11] D. J. Cavicchio Jr. Reproductive adaptive plans. In Proceedings of the 
ACM annual conference, volume 1, New York, NY, USA, 1972. ACM. 

[12] S. Stepney, R. E. Smith, J. Timmis, and A. M. Tyrrell. Towards a 
conceptual framework for artificial immune systems. In V. Cutello, P. J. 

Bentley, and J. Timmis, editors, Lecture Notes in Computer Science, 
pages 53-64. Springer- Verlag, Germany, 2004. 

[13] S. Stepney, R. E. Smith, J. Timmis, A. M. Tyrrell, M. J. Neal, and 
A. N. W. Hone. Conceptual frameworks for artificial immune systems. 
International Journal of Unconventional Computing, l(3):315-338, July 
2005. 

[14] J. Twy cross and U. Aickelin. Towards a conceptual framework for 
innate immunity. In Lecture Notes in Computer Science, pages 112-125. 
Springer, Germany, 2005. 



Copyrighted material 



9.3. Testing Algorithms 



367 



9.3 Testing Algorithms 

This section provides an introduction to software testing and the testing of 
Artificial Intelhgence algorithms. Section 9.3.1 introduces software testing 
and focuses on a type of testing relevant to algorithms called unit testing. 
Section 9.3.2 provides a specific example of an algorithm and a prepared 
suite of unit tests, and Section 9.3.3 provides some rules-of-thumb for testing 
algorithms in general. 

9.3.1 Software Testing 

Software testing in the field of Software Engineering is a process in the 
life-cycle of a software project that verifies that the product or service meets 
quality expectations and validates that software meets the requirements 
specification. Software testing is intended to locate defects in a program, 
although a given testing method cannot guarantee to locate all defects. As 
such, it is common for an application to be subjected to a range of testing 
methodologies throughout the software life-cycle, such as unit testing during 
development, integration testing once modules and systems are completed, 
and user acceptance testing to allow the stakeholders to determine if their 
needs have been met. 

Unit testing is a type of software testing that involves the preparation 
of well-defined procedural tests of discrete functionality of a program that 
provide confidence that a module or function behaves as intended. Unit 
tests are referred to as 'white-box' tests (contrasted to 'black-box' tests) 
because they are written with full knowledge of the internal structure of 
the functions and modules under tests. Unit tests are typically prepared by 
the developer that wrote the code under test and are commonly automated, 
themselves written as small programmers that are executed by a unit testing 
framework (such as JUnit for Java or the Test framework in Ruby). The 
objective is not to test each path of execution within a unit (called complete- 
test or complete-code coverage), but instead to focus tests on areas of risk, 
uncertainty, or criticality. Each test focuses on one aspect of the code (test 
one thing) and are commonly organized into test suites of commonality. 

Some of the benefits of unit testing include: 

• Documentation: The preparation of a suite of tests for a given sys- 
tem provide a type of programming documentation highlighting the 
expected behavior of functions and modules and providing examples 
of how to interact with key components. 

• Readability: Unit testing encourages a programming style of small 
modules, clear input and output and fewer inter- component depen- 
dencies. Code written for easy of testing (testability) may be easier 
to read and follow. 
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• Regression: Together, the suite of tests can be executed as a regression- 
test of the system. The automation of the tests means that any defects 
caused by changes to the code can easily be identified. When a defect 
is found that shpped through, a new test can be written to ensure it 
win be identified in the future. 

Unit tests were traditionahy written after the program was completed. 
A popular alternative is to prepare the tests before the functionality of 
the application is prepared, called Test-First or Test-Driven Development 
(TDD). In this method, the tests are written and executed, failing until 
the application functionality is written to make the test pass. The early 
preparation of tests allow the programmer to consider the behavior required 
from the program and the interfaces and functions the program needs to 
expose before they are written. 

The concerns of software testing are very relevant to the development, 
investigation, and application of Metaheuristic and Computational Intelli- 
gence algorithms. In particular, the strong culture of empirical investigation 
and prototype-based development demands a baseline level of trust in the 
systems that are presented in articles and papers. Trust can be instilled in 
an algorithm by assessing the quality of the algorithm implementation itself. 
Unit testing is lightweight (requiring only the writing of automated test 
code) and meets the needs of promoting quality and trust in the code while 
prototyping and developing algorithms. It is strongly suggested as a step in 
the process of empirical algorithm research in the fields of Metaheuristics, 
Computational Intelligence, and Biologically Inspired Computation. 

9.3.2 Unit Testing Example 

This section provides an example of an algorithm and its associated unit 
tests as an illustration of the presented concepts. The implementation of 
the Genetic Algorithm is discussed from the perspective of algorithm testing 
and an example set of unit tests for the Genetic Algorithm implementation 
are presented as a case study. 

Algorithm 

Listing 3.1 in Section 3.2 provides the source code for the Genetic Algorithm 
in the Ruby Programming Language. Important considerations when in us- 
ing the Ruby test framework, is ensuring that the functions of the algorithm 
are exposed for testing and that the algorithm demonstration itself does 
not execute. This is achieved through the use of the (if __FILE__ == $0) 
condition, which ensures the example only executes when the file is called 
directly, allowing the functions to be imported and executed independently 
by a unit test script. The algorithm is very modular with its behavior 
partitioned into small functions, most of which are independently testable. 
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The reproduce function has some dependencies although its orches- 
tration of sub-functions is stih testable. The search function is the only 
monolithic function, which both depends on all other functions in the imple- 
mentation (directly or indirectly) and hence is difficult to unit test. At best, 
the search function may be a case for system testing addressing functional 
requirements, such as "c?oes the algorithm deliver optimized solutions^\ 

Unit Tests 

Listing 9.3 provides the TC_GeneticAlgorithin class that makes use of the 
built-in Ruby unit testing framework by extending the TestCase class. The 
listing provides an example of ten unit tests for six of the functions in the 
Genetic Algorithm implementation. Two types of unit tests are provided: 

• Deterministic: Directly test the function in question, addressing ques- 
tions such as: does onemax add correctly? and does pointjnutation 

behave correctly? 

• Probabilistic: Test the probabilistic properties of the function in ques- 
tion, addressing questions such as: does random_bitstring provide 
an expected 50/50 mixture of Is and Os over a large number of cases? 
and does pointjnutation make an expected number of changes over 
a large number of cases? 

The tests for probabilistic expectations is a weaker form of unit testing 
that can be used to either provide additional confidence to deterministically 
tested functions, or to be used as a last resort when direct methods cannot 
be used. 

Given that a unit test should 'test one thing' it is common for a given 
function to have more than one unit tests. The reproduce function is a 
good example of this with three tests in the suite. This is because it is a 
larger function with behavior called in dependent functions which is varied 
based on parameters. 

class TC_GeneticAlgorithin < Test :: Unit :: TestCase 

# test that the objective function behaves as expected 
def test_onemax 

assert_equal (0 , onemax ( "0000") ) 
assert_equal (4, onemax (" 1 11 1 " ) ) 
assert_equal (2 , onemax (" 1010") ) 
end 

# test the creation of random strings 
def test_random_bitstring 

assert_equal ( 10 , random_bitstring(10) .size) 

assert _equal (0 , random_bit string (10) .delete('O') .delete('l') .size) 
end 
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# test the approximate proportion of I's and O's 

def test_random_bitstring_ratio 
s = random_bitstring(1000) 

assert_in_delta(0.5, (s . delete (' 1 '). size/1000.0) , 0.05) 
assert_in_delta(0.5, (s.delete( '0' ) .size/1000.0) , 0.05) 
end 

# test that members of the population are selected 
def test_binary_tournament 

pop = Array .new(lO) { I i I { : f itness=>i]- } 
10 . times {assert (pop. include? (binary_tournament (pop) ) ) } 
end 

# test point mutations at the limits 
def test_point_mutation 

assert_equal( "0000000000", point _mutation("0000000000" , 0)) 

assert_equal("llllllllll", point_inutation("llllllllll" , 0)) 
assert.equal ("1111111111", point _mutation("0000000000" , 1)) 
assert_equal("0000000000", point _mutatioii("llllllllll" , 1)) 
end 

# test that the observed changes approximate the intended probability 
def test_point_mutation_ratio 

changes = 0 
lOO.times do 

s = point _mutation("0000000000" , 0.5) 

changes += (10 - s.delete( ' 1 ' ) . size) 
end 

assert_in_delta(0.5, changes. to_f/( 100* 10) , 0.05) 
end 

# test cloning with crossover 
def test_crossover_clone 

pi, p2 = "0000000000", "1111111111" 
lOO.times do 

s = crossover (pi , p2, 0) 

assert_equal (pi , s) 

assert _not_saiae (pi, s) 
end 
end 

# test recombination with crossover 
def test_crossover_recombine 

pi, p2 = "0000000000", "1111111111" 
lOO.times do 

s = crossover (pi , p2, 1) 

assert_equal (pi . size, s.size) 

assert_not_equal (pi , s) 

assert _not _equEQ. (p2 , s ) 

s.size. times {|i| assert( (pi [i]==s [i] ) I I (p2[i]==s[i]) ) } 
end 
end 

# test odd sized population 
def test_reproduce_odd 

pop = Array. new(9) {|i| {if itness=>i, :bitstring=>"0000000000"> } 
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children = reproduce (pop, pop. size, 0, 1) 
assert_equal(9, children. size) 
end 

# test reproduce size mismatch 
def test_reproduce_inismatch 

pop = Array .new(lO) {|i| { : f itness=>i , : bitstring=>"0000000000"} > 

children = reproduce (pop, 9, 0, 0) 

assert_equal(9, children. size) 
end 
end 

Listing 9.3: Unit Tests for the Genetic Algorithm in Ruby 



9.3.3 Rules-of-Thumb 

Unit testing is easy, although writing good unit tests is difficult given the 

complex relationship the tests have with the code under test. Testing 
Metaheuristics and Computational Intelligence algorithms is harder again 
given their probabilistic nature and their ability to 'work in spite of you', 
that is, provide some kind of result even when implemented with defects. 
The following guidelines may help when unit testing an algorithm: 

• Start Small: Some unit tests are better than no unit test and each 
additional test can improve the trust and the quality of the code. For 
an existing algorithm implementation, start by writing a test for a 
small and simple behavior and slowly build up a test suite. 

• Test one thing: Each test should focus on verifying the behavior of 
one aspect of one unit of code. Writing concise and behavior- focused 
unit tests are the objective of the methodology. 

• Test once: A behavior or expectation only needs to be tested once, 
do not repeat a test each time a given luiit is tested. 

• Don't forget the I/O: Remember to test the inputs and outputs of a 
unit of code, specifically the pre-conditions and post- conditions. It 
can be easy to focus on the decision points within a unit and forget 
its primary purpose. 

• Write code for testahiUty: The tests should help to shape the code 
they test. Write small functions or modules, think about testing while 
writing code (or write tests first), and refactor code (update code after 
the fact) to make it easier to test. 

• Function independence: Attempt to limit the direct dependence be- 
tween functions, modules, objects and other constructs. This is related 
to testability and writing small functions although suggests limits on 
how much interaction there is between units of code in the algorithm. 
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Less dependence means less side-effects of a given unit of code and 
ultimately less complicated tests. 

• Test Independence: Test should be independent from each other. 
Frameworks provide hooks to set-up and tear-down state prior to the 
execution of each test, there should be no needed to have one test 
prepare data or state for other tests. Tests should be able to execute 
independently and in any order. 

• Test your own code: Avoid writing tests that verify the behavior 
of framework or library code, such as the randomness of a random 
number generator or whether a math or string function behaves as 
expected. Focus on writing test for the manipulation of data performed 
by the code you have written. 

• Probabilistic testing: Metaheuristics and Computational Intelligence 
algorithms generally make use of stochastic or probabilistic decisions. 
This means that some behaviors are not deterministic and are more 
difficult to test. As with the example, write probabilistic tests to verify 
that such processes behave as intended. Given that probabilistic tests 
are weaker than deterministic tests, consider writing deterministic 
tests first. A probabilistic behavior can be made deterministic by 
replacing the random number generator with a proxy that returns 
deterministic values, called a mock. This level of testing may require 
further impact to the original code to allow for dependent modules 
and objects to be mocked. 

• Consider test-first: Writing the tests first can help to crystallize 
expectations when implementing an algorithm from the literature, and 
help to solidify thoughts when developing or prototyping a new idea. 

9.3.4 References 

For more information on software testing, consult a good book on software 
engineering. Two good books dedicated to testing are ^''Beautiful Testing: 
Leading Professionals Reveal How They Improve Software" that provides a 
compendium of best practices from professional programers and testers [2] , 
and ^''Software testing" by Patton that provides a more traditional treatment 
[4]. 

Unit testing is covered in good books on software engineering or software 
testing. Two good books that focus on unit testing include " Test Driven 
Development: By Example" on the TDD methodology by Beck, a pioneer 
of Extreme Programming and Test Drive Development [1] and ^''Pragmatic 
unit testing in Java with J Unit" by Hunt and Thomas [3]. 
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9.4 Visualizing Algorithms 

This section considers the role of visuahzation in the development and 
application of algorithms from the fields of Metaheuristics, Computational 
Intelligence, and Biologically Inspired Computation. Visualization can be 
a powerful technique for exploring the spatial relationships between data 
(such as an algorithm's performance over time) and investigatory tool (such 
as plotting an objective problem domain or search space). Visualization 
can also provide a weak form of algorithm testing, providing observations 
of efficiency or efficacy that may be indicative of the expected algorithm 
behavior. 

This section provides a discussion of the techniques and methods that 
may be used to explore and evaluate the problems and algorithms described 
throughout this book. The discussion and examples in this section are 
primarily focused on function optimization problems, although the principles 
of visualization as exploration (and a weak form of algorithm testing) are 
generally applicable to function approximation problem instances. 

9.4.1 Gnuplot 

Gnuplot is a free open source command line tool used to generate plots 
from data. It supports a large number of different plot types and provides 
seemingly limitless configurability. Plots are shown to the screen by default, 
but the tool can easily be configured to generate image files as well as I^TjTjX, 
PostScript and PDF documents. 

Gnuplot can be downloaded from the website^ that also provides many 
demonstrations of diff"erent plot types with sample scripts showing how 
the plots were created. There are many tutorials and examples on the 
web, and help is provided inside the Gnuplot software by typing help 
followed by the command name (for example: help plot). For a more 
comprehensive reference on Gnuplot, see Janert's introductory book to the 
software, ''''Gnuplot in Action'''' [1]. 

Gnuplot was chosen for the demonstrations in this section as useful plots 
can be created with a minimum number of commands. Additionally, it is 
easily integrated into a range of scripting languages is supported on a range 
of modern operating systems. All examples in this section include both the 
resulting plot and the script used to generate it. The scripts may be typed 
directly into the Gnuplot interpreter or into a file which is processed by 
the Gnuplot command line tool. The examples in this section provide a 
useful starting point for visualizing the problems and algorithms described 
throughout this book. 



■^Gnuplot URL: http://www.gnuplot.info 
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9.4.2 Plotting Problems 

The visualization of the problem under study is an excellent start in learning 
about a given domain. A simple spatial representation of the search space 
or objective function can help to motivate the selection and configuration 
of an appropriate technique. 

The visualization method is specific to the problem type and instance 
being considered. This section provides examples of visualizing problems 
from the fields of continuous and combinatorial function optimization, two 
classes of problems that appear frequently in the described algorithms. 

Continuous Function Optimization 

A continuous function optimization problem is typically visualized in two 
dimensions as a line where x — input, y — /{input) or three dimensions as 
a surface where x,y = input, z = f (input). 

Some functions may have many more dimensions, which if the function 
is linearly separable can be visualized in lower dimensions. Functions that 
are not linearly-separable may be able to make use of projection techniques 
such as Principle Component Analysis (PCA). For example, preparing a 
stratified sample of the search space as vectors with associated cost function 
value and using PCA to project the vectors onto a two-dimensional plane 
for visualization. 

Similarly, the range of each variable input to the function may be large. 
This may mean that some of the complexity or detail may be lost when 
the function is visualized as a line or surface. An indication of this detail 
may be achieved by creating spot-sample plots of narrow sub-sections of 
the function. 

Figure 9.1 provides an example of the Basin function in one dimension. 
The Basin function is a continuous function optimization that seeks min f{x) 
where / = X^iLi -^i ' < Xi < 5.0. The optimal solution for this function 
is (vq, . . . ,Vn-i) = 0.0. Listing 9.4 provides the Gnuplot script used to 
prepare the plot (n = 1). 

set xrange [-5:5] 
plot x*x 

Listing 9.4: Gnuplot script for plotting a function in one-dimension. 

Figure 9.2 provides an example of the basin function in two-dimensions 
as a three-dimensional surface plot. Listing 9.5 provides the Gnuplot script 
used to prepare the surface plot. 

set xrange [-5:5] 
set yrange [-5:5] 
set zrange [0:50] 
splot x*x+y*y 



Listing 9.5: Gnuplot script for plotting a function in two-dimensions 
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Figure 9.2: Plot of the Basin function in two- dimensions. 
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Both plots show the optimum in the center of the domain at x = 0.0 in 
one-dimension and x,y = 0.0 in two-dimensions. 



Traveling Salesman Problem 

The Travehng Salesman Problem (TSP) description is comprised of a list of 
cities, each with a different coordinate (at least in the case of the symmetric 
TSP). This can easily be visualized as a map if the coordinates at latitudes 
and longitudes, or as a scatter plot. 

A second possible visualization is to prepare a distance matrix (distance 
between each point and all other points) and visualize the matrix directly, 
with each cell shaded relative to the distances of all other cells (largest 
distances darker and the shorter distances lighter). The light areas in the 
matrix highlight short or possible nearest-neighbor cities. 

Figure 9.3 provides a scatter plot of the Berlin52 TSP used through 
out the algorithm descriptions in this book. The Berlin52 problem seeks a 
permutation of the order to visit cities (called a tour) that minimize the 
total distance traveled. The optimal tour distance for Berlin52 is 7542 units. 
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Figure 9.3: Plot of the cities of the Berhn52 TSP. 

Listing 9.6 provides the Gnuplot script used to prepare the plot, where 
berlin52 . tsp is a file that contains a listing of the coordinates of all cities, 
one city per line separated by white space. Listing 9.7 provides a snippet of 
the first five lines of the berlin52.tsp file. 
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plot "berlin52.tsp" 

Listing 9.6: Gnuplot script for plotting the Berlin52 TSP. 



565.0 575.0 
25.0 185.0 
345.0 750.0 
945.0 685.0 
845.0 655.0 



Listing 9.7: Snippet of the berlin52.tsp file. 

The scatter plot shows some clustering of points toward the middle of 
the domain as well as many points spaced out near the periphery of the 
plot. An optimal solution is not obvious from looking at the plot, although 
one can see the potential for nearest-neighbor heuristics and importance of 
structure preserving operations on candidate solutions. 

9.4.3 Plotting Algorithm Performance 

Visualizing the performance of an algorithm can give indications that it is 
converging (implemented correctly) and provide insight into its dynamic 
behavior. Many algorithms are very simple to implement but exhibit complex 
dynamic behavior that is difficult to model and predict beforehand. An 
understanding of such behavior and the effects of changing an algorithm's 
parameters can be understood through systematic and methodological 
investigation. Exploring parameter configurations and plots of an algorithm's 
performance can give a quick first-pass approximation of the algorithms 
capability and potentially highlight fruitful areas for focused investigation. 

Two quite different perspectives on visualizing algorithm performance 
are: a single algorithm run and a comparison between multiple algorithm 
runs. The visualization of algorithm runs is explored in this section in the 
context of the Genetic Algorithm applied to a binary optimization problem 
called OneMax (see Section 3.2). 

Single Algorithm Run 

The performance of an algorithm over the course of a single run can easily 
be visualized as a line graph, regardless of the specific measures used. The 
graph can be prepared after algorithm execution has completed, although, 
many algorithm frameworks provide dynamic line graphs. 

Figure 9.4 provides an example line graph, showing the quality of the 
best candidate solution located by the Genetic Algorithm each generation 
for a single run applied to a 64-bit OneMax problem. Listing 9.8 provides 



9.4. Visualizing Algorithms 



379 



the Gnuplot script used to prepare the plot, where gal.txt is a text file 
that provides the fitness of the best solution each algorithm iteration on a 
new line. Listing 9.9 provides a snippet of the first five lines of the gal . txt 
file. 




0 5 10 15 20 25 30 



Figure 9.4: Line graph of the best solution found by the Genetic Algorithm. 



set yrange [45:64] 

plot "gal.txt" with linespoints 

Listing 9.8: Gnuplot script for creating a line graph. 



45 
45 
47 
48 
48 



Listing 9.9: Snippet of the gal.txt file. 



Multiple Algorithm Runs 

Multiple algorithm runs can provide insight into the tendency of an algorithm 
or algorithm configuration on a problem, given the stochastic processes 
that underlie many of these techniques. For example, a collection of the 
best result observed over a number of runs may be taken as a distribution 
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indicating the capability of an algorithm for solving a given instance of a 
problem. This distribution may be visualized directly. 

Figure 9.5 provides a histogram plot showing the best solutions found 
and the number of times they were located by Genetic Algorithm over 100 
runs on a 300-bit OneMax function. 
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Figure 9.5: Histogram of the best solutions found by a Genetic Algorithm. 



Listing 9.10 provide the Gnuplot script used to prepare the plot, where 
ga2.histogram.txt is a text file that contains discrete fitness values and 
the number of times it was discovered by the algorithm over 100 runs. 



set yrange [0:17] 




set xrange [275:290] 




plot "ga2.histogram.txt" 


with boxes 


Listing 9.10: 


Gnuplot script for creating a histogram. 


Listing 9.11 provides 


a snippet of the first five lines of the ga2 . histogram . 


file. 
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Listing 9.11: 


Snippet of the ga2.histogram.txt file. 



9.4. Visualizing Algorithms 



381 



Multiple Distributions of Algorithm Runs 

Algorithms can be compared against each other based on the distributions 
of algorithm performance over a number of runs. This comparison usually 
takes the form of statistical tests that can make meaningful statements 
about the differences between distributions. A visualization of the relative 
difference between the distributions can aid in an interpretation of such 
statistical measures. 

A compact way for representing a distribution is to use a box- and- whisker 
plot that partitions the data into quartiles, showing the central tendency 
of the distribution, the middle mass of the data (the second and third 
quartiles), the limits of the distribution and any outliers. Algorithm run 
distributions may be summarized as a box-and-whisker plots and plotted 
together to spatially show relative performance relationships. 

Figure 9.6 provides box-and-whisker plots of the best score distribution 
of 100 runs for the Genetic Algorithm applied to a 300-bit OneMax problem 
with three different mutation configurations. The measure collected from 
each run was the quality of the best candidate solution found. 

\ \ \ \ \ 
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Figure 9.6: Box-and-whisker plots of the Genetic Algorithm's performance. 

Listing 9.12 provide the Gnuplot script used to prepare the plot, where 
the file boxplotsl . txt contains summaries of the results one run per line, 
each each line containing the min, first, second, and third quartiles and the 
max values separated by a space. Listing 9.13 provides a complete listing of 
the three lines of the boxplotsl.txt file. 
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set bars 15.0 
set xrange [-1:3] 

plot 'boxplotsl.txt' using 0:2:1:5:4 with candlesticks whiskerbars 0.5 
Listing 9.12: Gnuplot script for creating a Box-and- whisker plot. 
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Listing 9.13: Complete listing of the boxplotsl.txt file. 



9.4.4 Plotting Candidate Solutions 

Visualizing candidate solutions can provide an insight into the complexity 
of the problem and the behavior of an algorithm. This section provides 
examples of visualizing candidate solutions in the context of their problem 
domains from both continuous and combinatorial function optimization. 

Continuous Function Optimization 

Visualizing candidate solutions from a continuous function optimization 
domain at periodic times over the course of a run can provide an indication of 
the algorithms behavior in moving through a search space. In low dimensions 
(such as one or two dimensions) this can provide qualitative insights into 
the relationship between algorithm configurations and behavior. 

Figure 9.7 provides a plot of the best solution found each iteration by 
the Particle Swarm Optimization algorithm on the Basin function in two 
dimensions (see Section 6.2). The positions of the candidate solutions are 
projected on top of a heat map of the Basin function in two-dimensions, with 
the gradient representing the cost of solutions at each point. Listing 9.14 
provides the Gnuplot script used to prepare the plot, where psol.txt is a 
file that contains the coordinates of the best solution found by the algorithm, 
with one coordinate per line separated by a space. Listing 9.15 provides a 
snippet of the first five lines of the psol . txt file. 

set xrange [-5:5] 
set yrange [-5:5] 
set pm3d map 

set palette gray negative 
set samples 20 
set isosamples 20 

splot x*x+y*y, "psol.txt" using 1:2: (0) with points 



Listing 9.14: Gnuplot script use to create a heat map and selected samples. 
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Figure 9.7: Heat map plot showing selected samples in the domain. 
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Listing 9.15: Snippet of the psol.txt file. 



Traveling Salesman Problem 

Visualizing the results of a combinatorial optimization can provide insight 
into the areas of the problem that a selected technique is handling well, or 
poorly. Candidate solutions can be visualized over the course of a run to 
observe how the complexity of solutions found by a technique change over 
time. Alternatively, the best candidate solutions can be visualized at the 
end of a run. 

Candidate solutions for the TSP are easily visualized as tours (order of 
city visits) in the context of the city coordinates of the problem definition. 

Figure 9.8 provides a plot of an example Nearest-Neighbor solution for 
the Berlin52 TSP. A Nearest-Neighbor solution is constructed by randomly 
selecting the first city in the tour then selecting the next city in the tour 
with the minimum distance to the current city until a complete tour is 
created. 
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0 200 400 600 800 1000 1200 1400 1600 1800 
Figure 9.8: Plot of a Nearest-Neighbor tour for the Berhn52 TSP. 

Listing 9.16 provides the Gnuplot script used to prepare the plot, where 
berlin52.nn.tour is a file that contains a listing of the coordinates of all 
cities separated by white space in order that the cities are visited with one 
city per line. The first city in the tour is repeated as the last city in the 
tour to provide a closed polygon in the plot. Listing 9.17 provides a snippet 
of the first five lines of the berlin52 .nn. tour file. 

plot "berlin52.nn.tour" with linespoints 

Listing 9.16: Gnuplot script for plotting a tour for a TSP. 



475 960 
525 1000 
510 875 
555 815 
575 665 



Listing 9.17: Snippet of the berlin52.nn.tour file. 

Figure 9.9 provides a plot of the known optimal solution for the Berlin52 
Traveling Salesman problem. 

Listing 9.18 provides the Gnuplot script used to prepare the plot, where 
berlin52 . optimal is a file that contains a listing of the coordinates of all 
cities in order that the cities are visited with one city per line separated by 
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0 200 400 600 800 1000 1200 1400 1600 1800 
Figure 9.9: Plot of the optimal tour for the Berlin52 TSP. 

white space. The first city in the tour is repeated as the last city in the tour 
to provide a closed polygon in the plot. 

plot "berlin52. optimal" with linespoints 

Listing 9.18: Gnuplot script for plotting a tour for a TSP. 



Listing 9.19 provides a snippet of the first five lines of the berlin52 . optimal 

file. 
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Listing 9.19: Snippet of the berlin52 . optimal file. 



9.4.5 Bibliography 
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9.5 Problem Solving Strategies 

The field of Data Mining has clear methodologies that guide a practitioner 
to solve problems, such as Knowledge Discovery in Databases (KDD) [16]. 
Metaheuristics and Computational Intelligence algorithms have no such 
methodology.^ 

This section describes some of the considerations when applying algo- 
rithms from the fields of Metaheuristics, Computational Intelligence, and 
Biologically Inspired Computation to practical problem domains. This 
discussion includes: 

• The suitability of application of a given technique to a given prob- 
lem and the transferability of algorithm and problem features (Sec- 
tion 9.5.1) 

• The distinction between strong and weak methods which use more 
or less problem specific information respectively, and the continuum 
between these extremes (Section 9.5.2). 

• A summary of problem solving strategies that suggest different ways 
of applying a given technique to the function optimization and ap- 
proximation fields (Section 9.5.3). 

9.5.1 Suitability of Application 

From a problem-solving perspective, the tools that emerge from the field 
of Computational Intelligence are generally assessed with regard to their 
utility as efficiently or effectively solving problems. An important lesson 
from the No-Free-Lunch Theorem was to bound claims of applicability (see 
Section subsecmfl), that is to consider the suitability of a given strategy 
with regard to the feature overlap with the attributes of a given problem 
domain. From a Computational Intelligence perspective, one may consider 
the architecture, processes, and constraints of a given strategy as the features 
of an approach. 

The suitability of the application of a particular approach to a problem 
takes into considerations concerns such as the appropriateness (can the 
approach address the problem), the feasibility (available resources and 
related efficiency concerns), and the flexibility (ability to address unexpected 
or unintended effects). This section summarizes a general methodology 
toward addressing the problem of suitability in the context of Computational 
Intelligence tools. This methodology involves 1) the systematic elicitation 
of system and problem features, and 2) the consideration of the overlap of 
problem-problem, algorithm- algorithm, and problem-algorithm overlap of 
feature sets. 

■^Some methods can be used for classification and regression and as such may fit into 
methodologies such as KDD. 
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Systematic Feature Elicitation 

A feature of a system (tool, strategy, model) or a problem is a distinctive 
element or property that may be used to differentiate it from similar and/or 
related cases. Examples may include functional concerns such as: processes, 
data structures, architectures, and constraints, as well as emergent concerns 
that may have a more subjective quality such as general behaviors, organiza- 
tions, and higher-order structures. The process of the elicitation of features 
may be taken from a system or problem perspective: 

• System Perspective: This requires a strong focus on the lower level 
functional elements and investigations that work toward correlating 
specific controlled procedures towards predictable emergent behaviors. 

• Problem Perspective: May require both a generalization of the specific 
case to the general problem well as a functional or logical 
decomposition into constituent parts. 

Problem generalization and functional decomposition are important and 
commonly used patterns for problem solving in the broader fields of Artificial 
Intelligence and Machine Learning. The promotion of simplification and 
modularity can reduce the cost and complexity of achieving solutions [10, 43]. 

Feature Overlap 

Overlap in elicited features may be considered from three important per- 
spectives: between systems, between problems, and between a system and 
a problem. Further, such overlap may be considered at different levels of 
detail with regard to generalized problem solving strategies and problem 
definitions. These overlap cases are considered as follows: 

• System Overlap defines the suitability of comparing one system to 
another, referred to as comparability. For example, systems may be 
considered for the same general problems and compared in terms of 
theoretical or empirical capability, the results of which may only be 
meaningful if the systems are significantly similar to each other as 
assessed in terms of feature overlap. 

• Problem Overlap defines the suitability of comparing one problem to 
another, referred to as transferability. From a systems focus, transfer- 
ability refers to the capability of a technique on a given problem to 
be successfully applied to another problem, the result of which is only 
meaningful if there is a strong overlap between the problems under 
consideration. 

• System-Problem Overlap defines the suitability of a system on a 
given problem, referred to as applicability. For example, a system is 
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considered suitable for a given problem if it has a significant overlap 
in capabilities with the requirements of the problem definition. 

Such mappings are imprecise given the subjective assessment and com- 
plexity required in both the elicitation and consideration overlap of the 
of features, the hardest of which is expected to be the mapping between 
systems and problems. The mapping of salient features of algorithms and 
problems was proposed as an important reconciliation of the No-Free-Lunch 
Theorem by Wolpert and Macready [58], although the important difference 
of this approach is that the system and algorithm are given prior to the 
assessment. In their first work on the theorem, Wolpert and Macready 
specifically propose the elicitation of the features from a problem-first per- 
spective, for which specialized algorithms can be defined [57]. Therefore, 
this methodology of suitability may be considered a generalization of this 
reconciliation suitable for the altered Computational hitelligence (strategy 
first) perspective on Artificial hitelligence. 

9.5.2 Strong and Weak Methods 

Generally, the methods from the fields of Metaheuristics, Computational 
Intelligence, and Biologically Inspired Computation may be considered weak 
methods. They are general purpose and are typically considered black-box 
solvers for a range of problem domains. The stronger the method, the more 
that must be known about the problem domain. Rather than discriminating 
techniques into weak and strong it is more useful to consider a continuum of 
methods from pure block box techniques that have few assumptions about 
the problem domain, to strong methods that exploit most or all of the 
problem specific information available. 

For example, the Traveling Salesman Problem is an example of a combi- 
natorial optimization problem. A naive (such a Random Search) black box 
method may simply explore permutations of the cities. Slightly stronger 
methods may initialize the search with a heuristic-generated technique 
(such as nearest neighbor) and explore the search space using a variation 
method that also exploits heuristic information about the domain (such as 
a 2-opt variation). Continuing along this theme, a stochastic method may 
explore the search space using a combination of probabilistic and heuristic 
information (such as Ant Colony Optimization algorithms). At the other 
end of the scale the stochastic elements are decreased or removed until one 
is left with pure heuristic methods such as the Lin-Kernighan heuristic [31] 
and exact algorithms from linear and dynamic programming that focus on 
the structure and nature of the problem [55] . 

Approaching a problem is not as simple as selecting the strongest method 
available and solving it. The following describes two potential strategies: 

• Start Strong: Select the strongest technique available and apply it 
to the problem. Difficult problems can be resistant to traditional 
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methods for many intrinsic and extrinsic reasons. Use products from 
a strong technique (best solution found, heuristics) to seed the next 
weaker method in hne. 

• Start Weak: Strong methods do not exist for all problems, and if they 
do exist, the computation, skill, and/or time resom'ces may not be 
available to exploit them. Start with a weak technique and use it to 
learn about the problem domain. Use this information to make better 
decisions about subsequent techniques to try that can exploit what 
has been learned. 

In a real- world engineering or business scenario, the objective is to solve 
a problem or achieve the best possible solution to the problem within the 
operating constraints. Concerns of algorithm and technique purity become 
less important than they may be in their respective fields of research. Both 
of the above strategies suggest an iterative methodology, where the product 
or knowledge gained from one technique may be used to prime a subsequent 
stronger or weaker technique. 

9.5.3 Domain-Specific Strategies 

An algorithm may be considered a strategy for problem solving. There 
are a wide range of ways in which a given algorithm can be used to solve 
a problem. Function Optimization and Function Approximation were 
presented as two general classes of problems to which the algorithms from 
the fields of Metaheuristics, Computational Intelligence, and Biologically 
Inspired Computation are applied. This section reviews general problem 
problem solving strategies that may be adopted for a given technique in 
each of these general problem domains. 

Function Optimization 

This section reviews a select set of strategies for addressing optimization 
problems from the field of Metaheuristics and Computational Intelligence to 
provide general insight into the state of the interaction between stochastic 
algorithms and the field of optimization. This section draws heavily from 
the field of Evolutionary Computation, Swarm Intelligence, and related 
Computational Intelligence sub-fields. 

Global and Local Optimization Global Optimization refers to seeking 
a globally optimal structure or approximation thereof in a given problem 
domain. Global is differentiated from Local Optimization in that the latter 
focuses on locating an optimal structure within a constrained region of 
the decision variable search space, such as a single peak or valley (basin of 
attraction). In the literature, global optimization problems refers to the class 
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of optimization problems that generally cannot be addressed through more 
conventional approaches such as gradient descent methods (that require 
mathematical derivatives) and pattern search (that can get 'stuck' in local 
optima and never converge) [41, 53]. 

A global search strategy provides the benefit of making few if any 
assumptions about where promising areas of the search space may be, 
potentially highlighting unintuitive combinations of parameters. A local 
search strategy provides the benefit of focus and refinement of an existing 
candidate solution. It is common to apply a local search method to the 
solutions located by a global search procedure as a refinement strategy 
(such as using a Hill Climber (Section 2.4) after a Genetic Algorithm 
(Section 3.2)), and some methods have both techniques built in (such as 
GRASP in Section 2.8). 

Parallel Optimization A natural step toward addressing difficult (large 
and rugged cost landscapes) is to exploit parallel and distributed hardware, 
to get an improved result in the same amount of time, the same result in less 
time, or both [12]. Towards unifying the myriad of approaches and hardware 
configurations, a general consensus and taxonomy has been defined by the 
Parallel Evolutionary Algorithms (PEA) and Parallel Metaheuristics fields 
that considers the ratio of communication to computation called granularity 

[4, 11]. 

This taxonomy is presented concisely by Alba and Tomassini as a plot 
or trade-off of three concerns: 1) the number of sub-populations (models or 
parallel strategies working on the problem), 2) the coupling between the 
sub-populations (frequency and amplitude of communication), and 3) the 
size of the sub-populations (size or extent of the sub-models) [5]. 

Two important and relevant findings from the narrower field of Parallel 
Evolutionary Algorithms include 1) that tight coupling (frequent inter- 
system migration of candidate solutions) between coarse-grained models 
typically results in worse performance than a non-distributed approach [6], 
and 2) that loose coupling (infrequent migration) between coarse-grained 
models has been consistently shown to provide a super-linear increase in 
performance [3, 7, 11]. 

Cooperative Search This is a more general approach that considers the 
use of multiple models that work together to address a difficult optimization 
problems. Durfee et al. consider so-called Cooperative Distributed Problem 
Solving (CDPS) in which a network of loosely coupled solvers are employed 
to address complex distributed problems. In such systems, it is desirable 
to match the processing capabilities of the solver to the attributes of the 
problem. For example, a given problem may have spatially distributed, 
functionally distributed, or temporally distributed sub-problems to which a 
centralized and monolithic system may not be suitable. 
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Lesser [30] considers CDPS and proposes such models perform dis- 
tributed search on dependent or independent and potentially overlapping 
sub-problems as a motivating perspective for conducting research into 
Distributed Artificial Intelligence (DAI)'*. Lesser points out that in real 
world applications, it is hard to get a optimal mapping between the al- 
located resources and the needs or availability of information for a given 
problem, suggesting that such problems may be caused by a mismatch in 
processing times and/or number of sub-problems, interdependencies be- 
tween sub-problems, and local experts whose expertise cannot be effectively 
communicated. For a more detail on the relationships between parallel and 
cooperative search, El-Abd and Kamel provide a rigorous taxonomy [15]. 

Hybrid Search Hybrid Search is a perspective on optimization that fo- 
cuses on the use of multiple and likely different approaches either sequentially 
(as in the canonical global and local search case), or in parallel (such as in 
Cooperative Search). For example in this latter case, it is common in the 
field of PEA to encourage different levels of exploration and exploitation 
across island populations by varying the operators or operator configurations 
used [2, 51]. 

Talbi proposed a detailed 4-level taxonomy of Hybrid Metaheuristics 
that concerns parallel and cooperating approaches [50]. The taxonomy 
encompasses parallel and cooperative considerations for optimization and fo- 
cuses on the discriminating features in the lowest level such as heterogeneity, 
and specialization of approaches. 

Functional Decomposition Three examples of a functional decomposi- 
tion of optimization include 1) multiple objectives, 2) multiple constraints, 
and 3) partitions of the decision variable search space. 

Multi-Objective Optimization (MOO) is a sub-field that is concerned 
with the optimization of two or more objective functions. A solution to 
a MOO conventionally involves locating and returning a set of candidate 
solutions called the non-dominated set [13]. The Pareto optimal set, is the 
set of optimal non-dominated solutions. For a given problem no feasible 
solution exists that dominates a Pareto optimal solution. All solutions that 
are Pareto optimal belong to the Pareto set, and the points that these 
solutions map to in the objective space is called the Pareto front. The 
complexity with MOO problems is in the typically unknown dependencies 
between decision variables across objectives, that in the case of conflicts, 
must be traded off (Purshouse and Fleming provide a taxonomy of such 
complexity [42]). 

Constraint Satisfaction Problem's (CSP) involve the optimization of 
decision variables under a set of constraints. The principle complexity in 

^This perspective provided the basis for what became the field of Multi-Agent Systems 
(MAS). 
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such problems is in locating structures that are feasible or violate the least 
number of constraints, optimizing such feasibility [27, 54]. 

Search Space Partitioning involves partitioning of the decision variable 
search space (for example see Multispace Search by Gu et al. [14, 21, 22]). 
This is a critical consideration given that for equal-sized dimensional bounds 
on parameters, an increase in decision variables results in an exponential 
increase in the volume of the space to search. 

Availability Decomposition Optimization problems may be partitioned 
by the concerns of temporal and spatial distribution of 1) information 
availability, and 2) computation availability. An interesting area of research 
regarding variable information availability for optimization problems is 
called Interactive Evolutionary Computation, in which one or a collection 
of human operators dynamically interact with an optimization process [49] . 
Example problem domains include but are not limited to computer graphics, 
industrial design, image processing, and drug design. 

There is an increasing demand to exploit clusters of heterogeneous 
workstations to complete large-scale distributed computation tasks like 
optimization, typically in an opportunistic manner such as when individual 
machines are underutilized. The effect is that optimization strategies such 
as random partitioning of the search space (independent non-interacting pro- 
cessing) are required to take advantage of such environments for optimization 
problems [32, 46]. 

Meta Optimization One may optimize at a level above that considered 
in previous sections. Specifically, 1) the iterative generation of an inductive 
model called multiple restart optimization, and 2) the optimization of the 
parameters of the process that generates an inductive model of an optimiza- 
tion problem. Multiple or iterative restarts involves multiple independent 
algorithm executions from different (random) starting conditions. It is gen- 
erally considered as a method for achieving an improved result in difficult 
optimization problems where a given strategy is deceived by local or false 
optima [24, 34], typically requiring a restart schedule [17]. 

A second and well studied form of meta optimization involves the op- 
timization of the search process itself. Classical examples include the 
self-adaptation of mutation parameters (step sizes) in the Evolutionary 
Strategies (ES) and Evolutionary Programming (EP) approaches. Smith 
and Fogarty provided a review of genetic algorithms with adaptive strategies 
including a taxonomy in which the meta- adaptations are applied at one 
of three levels: 1) the population (adapting the overall sampling strategy), 
2) the individual (adapting the creation of new samples in the decision 
variable space), and 3) components (modifying component contributions 
and/or individual step sizes as in ES and EP) [48]. 
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Function Approximation 

This section reviews a select set of strategies for addressing Function Approx- 
imation problems from the fields of Artificial Intelligence and Computational 
Intelligence to to provide general insight into the state of the interaction 
between stochastic algorithms and the field. The review draws heavily from 
the fields of Artificial Neural Networks, specifically Competitive Learning, 
as well as related inductive Machine Learning fields such as Instance Based 
Learning. 

Vector Quantization Vector Quantization (VQ) refers to a method 
of approximating a target function using a set of exemplar (prototype 
or codebook) vectors. The exemplars represent a discrete subset of the 
problem, generally restricted to the features of interest using the natural 
representation of the observations in the problem space, typically an an 
unconstrained n-dimensional real valued space. The VQ method provides 
the advantage of a non-parametric model of a target function (like instance- 
based and lazy learning such as the fc-Nearest-Neighbor method (/cNN)) 
using a symbolic representation that is meaningful in the domain (like 
tree-based approaches). 

The promotion of compression addresses the storage and retrieval con- 
cerns of A;NN, although the selection of codebook vectors (the so-called 
quantization problem) is a hard problem that is known to be NP-complete 
[18]. More recently Kuncheva and Bezdek have worked towards unifying 
quantization methods in the application to classification problems, referring 
to the approaches as Nearest Prototype Classifiers (NFC) and proposing a 
generalized nearest prototype classifier [28, 29]. 

Parallelization Instance- based approaches are inherently parallel given 
the generally discrete independent nature in which they are used, specifically 
in a case or per-query manner. As such, parallel hardware can be exploited 
in the preparation of the corpus of prototypes (parallel preparation), and 
more so in the application of the corpus given its read-only usage [1, 35, 39]. 
With regard to vector quantization specifically, there is an industry centered 
around the design and development of VQ and WTA algorithms and circuits 
given their usage to compress digital audio and video data [36, 38]. 

Cooperative Methods Classical cooperative methods in the broader 
field of statistical machine learning" are referred to as Ensemble Methods 
[37, 40] or more recently Multiclassifier Systems [20]. 

Boosting is based on the principle of combining a set of quasi-independent 
weak learners that collectively are as effective as a single strong learner 
[26, 44]. The seminal approach is called Adaptive Boosting (AdaBoost) that 
involves the preparation of a series of classifiers, where subsequent classifiers 
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are prepared for the observations that are misclassified by the proceeding 
classifier models (creation of specialists) [45]. 

Bootstrap Aggregation (bagging) involves partitioning the observations 
into A'^ randomly chosen subsets (with re-selection), and training a differ- 
ent model on each [9]. Although robust to noisy datasets, the approach 
requires careful consideration as to the consensus mechanism between the 
independent models for decision making. 

Stacked Generalization (stacking) involves creating a sequence of models 
of generally different types arranged into a stack, where subsequently added 
models generalize the behavior (success or failure) of the model before it 
with the intent of correcting erroneous decision making [52, 56]. 

Functional Decomposition As demonstrated, it is common in ensemble 
methods to partition the dataset either explicitly or implicitly to improve 
the approximation of the underlying target function. A first important 
decomposition involves partitioning the problem space into sub-spaces based 
on the attributes, regular groups of attributes called features, and decision 
attributes such as class labels. A popular method for attribute-based 
partitioning is called the Random Subspace Method, involving the random 
partitioning of attributes to which specialized model is prepared for each 
(commonly used on tree-based approaches) [23]. 

A related approach involves a hierarchical partitioning of attributes space 
into sub- vectors (sub-spaces) used to improve VQ-based compression [19]. 
Another important functional decomposition methods involve the partition- 
ing of the set of observations. The are many ways in which observations 
may be divided, although common approaches include pre-processing using 
clustering techniques to divide the set into natural groups, additional statis- 
tical approaches that partition based on central tendency and outliers, and 
re-sampling methods that are required to reduce the volume of observations. 

Availability Decomposition The availability observations required to 
address function approximation in real- world problem domains motivate the 
current state of the art in Distributed Data Mining (DDM, or sometimes 
Collective Data Mining), Parallel Data Mining (PDM), and Distributed 
Knowledge Discovery in Database (DKDD) [25]. The general information 
availability concerns include 1) the intractable volume of observations, and 2) 
the spatial (geographical) and temporal distribution of information [59]. In 
many real-world problems it is infeasible to centralize relevant observations 
for modeling, requiring scalable, load balancing, and incremental acquisition 
of information [47] . 

Meta- Approximation The so-called ensemble or multiple-classifier meth- 
ods may be considered meta approximation approaches as they are not 
specific to a given modeling technique. As with function optimization, 
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meta-approaches may be divided into restart methods and meta-learning 
algorithms. The use of restart methods is a standard practice for connection- 
ist approaches, and more generahy in approaches that use random starting 
conditions and a gradient or local search method of refinement. 

The method provides an opportunity for over-coming local optima in 
the error-response surface, when there is an unknown time remaining until 
convergence [33], and can exploit parallel hardware to provide a speed 
advantage [8]. Ensemble methods and variants are examples of meta ap- 
proximation approaches, as well as the use of consensus classifiers (gate 
networks in mixtures of experts) to integrate and weight the decision making 
properties from ensembles. 
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9.6 Benchmarking Algorithms 

When it comes to evaluating an optimization algorithm, every researcher 
has their own thoughts on the way it should be done. Unfortunately, many 
empirical evaluations of optimization algorithms are performed and reported 
without addressing basic experimental design considerations. This section 
provides a summary of the literature on experimental design and empirical 
algorithm comparison methodology. This summary contains rules of thumb 
and the seeds of best practice when attempting to configure and compare 
optimization algorithms, specifically in the face of the no-free-lunch theorem. 

9.6.1 Issues of Benchmarking Methodology 

Empirically comparing the performance of algorithms on optimization prob- 
lem instances is a staple for the fields of Heuristics and Biologically Inspired 
Computation, and the problems of effective comparison methodology have 
been discussed since the inception of these fields. Johnson suggests that the 
coding of an algorithm is the easy part of the process; the difficult work 
is getting meaningful and publishable results [24]. He goes on to provide 
a very through list of questions to consider before racing algorithms, as 
well as what he describes as his "pet peeves" within the field of empirical 
algorithm research. 

Hooker [22] (among others) practically condemns what he refers to 
as competitive testing of heuristic algorithms, calling it ^^fundamentally 
anti-intellectuaV . He goes on to strongly encourag a rigorous methodology 
of what he refers to as scientific testing where the aim is to investigate 
algorithmic behaviors. 

Barr, Golden et al. [1] list a number of properties worthy of a heuristic 
method making a contribution, which can be paraphrased as; efficiency, 
efficacy, robustness, complexity, impact, generalizability, and innovation. 
This is interesting given that many (perhaps a majority) of conference 
papers focus on solution quality alone (one aspect of efficacy). In their 
classical work on reporting empirical results of heuristics Barr, Golden et al. 
specify a loose experimental setup methodology with the following steps: 

1. Define the goals of the experiment. 

2. Select measure of performance and factors to explore. 

3. Design and execute the experiment. 

4. Analyze the data and draw conclusions. 

5. Report the experimental results. 

They then suggest eight guidelines for reporting results, in summary 
they are; reproducibility, specify all influential factors (code, computing 
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environment, etc), be precise regarding measures, specify parameters, use 
statistical experimental design, compare with other methods, reduce vari- 
ability of results, and ensure results are comprehensive. They then clarify 
these points with examples. 

Peer, Engelbrecht et al. [32] summarize the problems of algorithm bench- 
marking (with a bias toward particle swarm optimization) to the following- 
points: duplication of effort, insufficient testing, failure to test against 
state-of-the-art, poor choice of parameters, conflicting results, and invalid 
statistical inference. Eiben and Jelasity [14] sight four problems with the 
state of benchmarking evolutionary algorithms; 1) test instances are chosen 
ad hoc from the literature, 2) results are provided without regard to research 
objectives, 3) scope of generalized performance is generally too broad, and 
4) results are hard to reproduce. Gent and Walsh provide a summary of 
simple dos and don'ts for experimentally analyzing algorithms [20]. For 
an excellent introduction to empirical research and experimental design in 
artificial intelligence see Cohen's book Empirical Methods for Artificial 
Intelligence" [10]. 

The theme of the classical works on algorithm testing methodology is that 
there is a lack of rigor in the field. The following sections will discuss three 
main problem areas to consider before benchmarking, namely 1) treating 
algorithms as complex systems that need to be tuned before applied, 2) 
considerations when selecting problem instances for benchmarking, and 
3) the selection of measures of performance and statistical procedures for 
testing experimental hypotheses. A final section 4) covers additional best 
practices to consider. 

9.6.2 Selecting Algorithm Parameters 

Optimization algorithms are parameterized, although in the majority of 
cases the effect of adjusting algorithm parameters is not fully understood. 
This is because unknown non-linear dependencies commonly exist between 
the variables resulting in the algorithm being considered a complex sys- 
tem. Further, one must be careful when generalizing the performance of 
parameters across problem instances, problem classes, and domains. Finally, 
given that algorithm parameters are typically a mixture of real and integer 
numbers, exhaustively enumerating the parameter space of an algorithm is 
commonly intractable. 

There are many solutions to this problem such as self-adaptive pa- 
rameters, meta- algorithms (for searching for good parameter values), and 
methods of performing sensitivity analysis over parameter ranges. A good 
introduction to the parameterization of genetic algorithms is Lobo, Lima 
et al. [27]. The best and self-evident place to start (although often ignored 
[14]) is to investigate the literature and see what parameters been used 
historically. Although not a robust solution, it may prove to be a useful 
starting point for further investigation. The traditional approach is to 



CiJ!. 



402 



Chapter 9. Advanced Topics 



run an algorithm on a large number of test instances and generalize the 
results [37]. We, as a field, haven't really come much further than this 
historical methodology other than perhaps the application of more and 
differing statistical methods to decrease effort and better support findings. 

A promising area of study involves treating the algorithm as a complex 
systems, where problem instances may become yet another parameter of 
the model [7, 36]. From here, sensitivity analysis can be performed in 
conjunction with statistical methods to discover parameters that have the 
greatest effect [8] and perhaps generalize model behaviors. 

Francois and Lavergne [18] mention the deficiencies of the traditional 
trial-and-error and experienced-practitioner approaches to parameter tuning, 
further suggesting that seeking general rules for parameterization will lead to 
optimization algorithms that offer neither convergent or efficient behaviors. 
They offer a statistical model for evolutionary algorithms that describes 
a functional relationship between algorithm parameters and performance. 
Nannen and Eiben [29, 30] propose a statistical approach called REVAC 
(previously Calibration and Relevance Estimation) to estimating the rele- 
vance of parameters in a genetic algorithm. Coy, Golden et al. [12] use a 
statistical steepest decent method procedure for locating good parameters 
for metaheuristics on many different combinatorial problem instances. 

Bartz-Beielstein [3] used a statistical experimental design methodol- 
ogy to investigate the parameterization of the Evolutionary Strategy (ES) 
algorithm. A sequential statistical methodology is proposed by Bartz- 
Beielstein, Parsopoulos et al. [4] for investigating the parameterization and 
comparisons between the Particle Swarm Optimization (PSO) algorithm, 
the Nelder-Mead Simplex Algorithm (direct search), and the Quasi-Newton 
algorithm (derivative-based). Finally, an approach that is popular within 
the metaheuristic and Ant Colony Optimization (ACO) community is to use 
automated Monte Carlo and statistical procedures for sampling discretized 
parameter space of algorithms on benchmark problem instances [6]. Similar 
racing procedures have also been applied to evolutionary algorithms [41]. 

9.6.3 Problem Instances 

This section focuses on issues related to the selection of function optimiza- 
tion test instances, but the general theme of cautiously selecting problem 
instances is generally applicable. 

Common lists of test instances include; De Jong [25], Fogel [17], and 
Schwefel [38]. Yao, Lui et al. [40] list many canonical test instances as 
does Schaffer, Caruana et al. [37]. Gallagher and Yuan [19] review test 
function generators and propose a tunable mixture of Gaussians test problem 
generators. Finally, McNish [28] proposes using fractal-based test problem 
generators via a web interface. 

The division of test problems into classes is another axiom of modern 
optimization algorithm research, although the issues with this methodology 
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are the taxonomic criterion for problem classes and on the selection of 
problem instances for classes. 

Eiben and Jelasity [14] strongly support the division of problem instances 
into categories and encourage the evaluation of optimization algorithms over 
a large number of test instances. They suggest classes could be natural 
(taken from the real world), or artificial (simplified or generated). In 
their paper on understanding the interactions of GA parameters, Deb and 
Agrawal [13] propose four structural properties of problems for testing ge- 
netic algorithms; multi-modality, deception, isolation, and collateral noise. 
Yao, Lui et al. [40] divide their large test dataset into the categories of 
unimodal, 'multimodal-many local optima', and 'multimodal- few local op- 
tima'. Whitley, Rana et al. [39] provide a detailed study on the problems of 
selecting test instances for genetic algorithms. They suggest that difficult 
problem instances should be non-linear, non-separable, and non- symmetric. 

English [15] suggests that many functions in the field of EC are selected 
based on structures in the response surface (as demonstrated in the above 
examples), and that they inherently contain a strong Euclidean bias. The 
implication is that the algorithms already have some a priori knowledge 
about the domain built into them and that results are always reported on 
a restricted problem set. This is a reminder that instances are selected to 
demonstrate algorithmic behavior, rather than performance. 

9.6.4 Measures and Statistical Methods 

There are many ways to measure the performance of an optimization algo- 
rithm for a problem instance, although the most common involves a quality 
(efficacy) measure of solution (s) found (see the following for lists and discus- 
sion of common performance measures [1, 4, 5, 14, 23]). Most biologically 
inspired optimization algorithms have a stochastic element, typically in their 
starting position(s) and in the probabilistic decisions made during sampling 
of the domain. Thus, the performance measurements must be repeated a 
number of times to account for the stochastic variance, which could also be 
a measure of comparison between algorithms. 

Irrespective of the measures used, sound statistical experimental design 
requires the specification of 1) a null hypothesis (no change), 2) alternative 
hypotheses (difference, directional difference), and 3) acceptance or rejection 
criteria for the hypothesis. The null hypothesis is commonly stated as the 
equality between two or more central tendencies (mean or medians) of a 
quality measure in a typical case of comparing stochastic-based optimization 
algorithms on a problem instance. 

Peer, Engelbrech et al. [32] and Birattari and Dorigo [5] provide a basic 
introduction (suitable for an algorithm-practitioner) into the appropriateness 
of various statistical tests for algorithm comparisons. For a good introduction 
to statistics and data analysis see Peck et al. [31], for an introduction to 
non-parametric methods see Holander and Wolfe [21], and for a detailed 
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presentation of parametric and nonparametric methods and their suitabihty 
of apphcation see Sheskin [23]. For an excellent open source software package 
for performing statistical analysis on data see the R Project.'^ 

To summarize, parametric statistical methods are used for interval and 
ratio data (like a real- valued performance measure), and nonparametric 
methods are used for ordinal, categorical and rank-based data, hiterval data 
is typically converted to ordinal data when salient constraints of desired 
parametric tests (such as assumed normality of distribution) are broken 
such that the less powerful nonparametric tests can be used. The use of 
nonparametric statistical tests may be preferred as some authors [9, 32] claim 
the distribution of cost values are very asymmetric and/or not Gaussian. It 
is important to remember that most parametric tests degrade gracefully. 

Chiarandini, Basso et al. [9] provide an excellent case study for using 
the permutation test (a nonparametric statistical method) to compare 
stochastic optimizers by running each algorithm once per problem instance, 
and multiple times per problem instance. While rigorous, their method 
appears quite complex and their results are difficult to interpret. 

Barrett, Marathe et al. [2] provide a rigorous example of applying the 
parametric test Analysis of Variance (ANOVA) of three different heuristic 
methods on a small sample of scenarios. Reeves and Write [34, 35] also 
provide an example of using ANOVA in their investigation into epistasis on 
genetic algorithms. In their tutorial on the experimental investigation of 
heuristic methods, Rardin and Uzsoy [33] warn against the use of statistical 
methods, claiming their rigidity as a problem, and the importance of practi- 
cal significance over that of statistical significance. They go on in the face 
of their own objections to provide an example of using ANOVA to analyze 
the results of an illustrative case study. 

Finally, Peer, Engelbrech et al. [32] highlight a number of case study 
example papers that use statistical methods inappropriately. In their 
OptiBench system and method, algorithm results are standardized, ranked 
according to three criteria and compared using the Wilcoxon Rank- Sum 
test, a non-parametric alternative to the Student-T test that is commonly 
used. 

9.6.5 Other 

Another pervasive problem in the field of optimization is the reproducibility 
(implementation) of an algorithm. An excellent solution to this problem 
is making source code available by creating or collaborating with open- 
source software projects. This behavior may result in implementation 
standardization, a reduction in the duplication of effort for experimentation 
and repeatability, and perhaps more experimental accountability [14, 32]. 

Project is online at http://www.r-project.org 
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Peer, Engelbrech et al. [32] stress the need to compare to the state-of- 
the-art implementations rather than the historic canonical implementations 
to give a fair and meaningful evaluation of performance. 

Another area that is often neglected is that of algorithm descriptions, 
particularly in regard to reproducibility. Pseudocode is often used, although 
(in most cases) in an inconsistent manner and almost always without refer- 
ence to a recognized pseudocode standard or mathematical notation. Many 
examples are a mix of programming languages, English descriptions and 
mathematical notation, making them difficult to follow, and commonly 
impossible to implement in software due to incompleteness and ambiguity. 

An excellent tool for comparing optimization algorithms in terms of 
their asymptotic behavior from the field of computation complexity is the 
Big-O notation [11]. In addition to clarifying aspects of the algorithm, it 
provides a problem independent way of characterizing an algorithms space 
and or time complexity. 

9.6.6 Summary 

It is clear that there is no silver bullet to experimental design for empirically 
evaluating and comparing optimization algorithms, although there are as 
many methods and options as there are publications on the topic. The 
field of stochastic optimization has not yet agreed upon general methods 
of application like the field of data mining (processes such as Knowledge 
Discovery in Databases (KDD) [16]). Although these processes are not 
experimental methods for comparing machine learning algorithms, they do 
provide a general model to encourage the practitioner to consider important 
issues before application of an approach. 

Finally, it is worth pointing out a somewhat controversially titled paper 
by De Jong [26] that provides a reminder that although the genetic algorithm 
has been shown to solve function optimization, it is not innately a function 
optimizer, and function optimization is only a demonstration of this complex 
adaptive system's ability to learn. It is a reminder to be careful not to 
link an approach too tightly with a domain, particularly if the domain was 
chosen for demonstration purposes. 
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Appendix A 

Ruby: Quick-Start Guide 



A.l Overview 

All code examples in this book are provided in the Ruby programming 
language. This appendix provides a high-level introduction to the Ruby 
programming language. This guide is intended for programmers of an 
existing imperative or programming language (such as Python, Java, C, 
C++, C#) to learn enough Ruby to be able to interpret and modify the 
code examples provided in the Clever Algorithms project. 

A. 2 Language Basics 

This section summarizes the basics of the language, including variables, flow 
control, data structures, and functions. 

A.2.1 Ruby Files 

Ruby is an interpreted language, meaning that programs are typed as text 
into a . rb file which is parsed and executed at the time the script is run. For 
example, the following snippet shows how to invoke the Ruby interpreter on 
a script in the file genet ic_algorithin.rb from the command line: ruby 
genet ic.algorithm . rb 

Ruby scripts are written in ASCII text and are parsed and executed 
in a linear manner (top to bottom). A script can define functionality (as 
modules, functions, and classes) and invoke functionality (such as calling a 
function). 

Comments in Ruby axe defined by a # character, after which the re- 
mainder of the line is ignored. The only exception is in strings, where the 
character can have a special meaning. 
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The ruby interpreter can be used in an interactive manner by typing 
out a ruby script directly. This can be useful for testing specific behavior. 
For example, it is encouraged that you open the ruby interpreter and follow 
along this guide by typing out the examples. The ruby interpreter can be 
opened from the command line by typing irb and exited again by typing 
exit from within the interpreter. 

A.2.2 Variables 

A variable holds a piece of information such as an integer, a scalar, boolean 
or a string. 

a = 1 # o holds the integer value '1' 

b = 2.2 # b holds the floating point value '2.2' 

c = false # c holds the boolean value false 

d = "hello, world" # d holds the string value 'hello, world' 

Ruby has a number of different data types (such as numbers and strings) 
although it does not enforce the type safety of variables. Instead it uses 
'duck typing', where as long as the value of a variable responds appropriately 

to messages it receives, the interpreter is happy. 

Strings can be constructed from static text as well as the values of 
variables. The following example defines a variable and then defines a string 
that contains the variable. The ^{} is a special sequence that informs the 
interrupter to evaluate the contents of inside the brackets, in this case to 
evaluate the variable n, which happens to be assigned the value 55. 

n = 55 # an integer 

s = "The number is: #{n}" # => The number is: 55 

The values of variables can be compared using the == for equality and 
!= for inequality. The following provides an example of testing the equality 
of two variables and assigning the boolean (true or false) result to a third 
variable. 

a = 1 

b = 2 

c = (a == b) # false 

Ruby supports the classical && and | | for AND and or OR, but it also 
support the eind and or keywords themselves. 

a = 1 

b = 2 

c = a==l and b==2 # true 



A.2.3 Flow Control 



A script is a sequence of statements that invoke pre-defined functionality. 
There are structures for manipulating the flow of control within the script 
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such as conditional statements and loops. 

Conditional statements can take the traditional forms of if condition 
then action, with the standard variants of if-then-else and if-then- elseif. For 
example: 



a == 1 
b == 2 
if (a == b) 

a += 1 # equivalent to a = a + a 
elsif a. = 1 # brackets around conditions are optional 

a. = 1 # this line is executed 
else 

a = 0 
end 



Conditional statements can also be added to the end of statements. For 
example a variable can be assigned a value only if a condition holds, defined 
all on one line. 



a = 2 

b = 99 if a == 2 # b => 95 



Loops allow a set of statements to be repeatedly executed until a 
condition is met or while a condition is not met 



a = 0 

while a < 10 # condition before the statements 

puts a += 1 
end 



b = 10 
begin 

puts b -= 1 

end until b==0 # condition after the statements 



As with the if conditions, the loops can be added to the end of statements 
allowing a loop on a single line. 



a = 0 

puts a += 1 vhile a<10 



A. 2. 4 Arrays and Hashs 

An array is a linear collection of variables and can be defined by creating a 
new Array object. 



1 a = [] # define a new array implicitly 

2 a = Array. new # explicilty create a new array 

3 a = Array .new (10) # create a new array with space for 10 items 



The contents of an array can be accessed by the index of the element. 
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a = [1, 2, 3] # inline declaration and definition of an array 
b = a[0] # first element, equivilient to a. first 

Arrays are also not fixed sized and elements can be added and deleted 
dynamically. 

a = [1, 2, 3] # inline declaration and definition of an array 
a « 4 # => [1, 2, 3, 4] 

a.delete_at(0) # => returns 1, a is now [2, 3, 4] 

A hash is an associative array, where values can be stored and accessed 
using a key. A key can be an object (such as a string) or a symbol. 

h = {} # empty hash 
h = Hash. new 

h = {"A"=>1, "B"=>2} # string keys 
a = h["A"] # => 1 



h = {:a=>l, :b=>2> # label keys 

a = h[:a] # => 1 

h[:c] = 3 # add new key-value combination 
h[:d] # => nil as there is no value 



A. 2. 5 Functions and Blocks 

The puts function can be used to write a line to the console. 

putsC'Testing 1, 2, 3") # => Testing 1, 2, 3 

puts "Testing 4, 5, 6" # note brackets are not required for the function call 

Functions allow a program to be partitioned into discrete actions and 
pre-defined and reusable. The following is an example of a simple function. 

def test_function() 

puts "Test!" 
end 



puts test_f unction # => Test! 



A function can take 


a list of variables called function arguments. 


def test_f unction (a) 




puts "Test: #{a}" 




end 




puts test_f unction("nie") 


# => Test: me 



Function arguments can have default values, meaning that if the argu- 
ment is not provided in a call to the function, that the default is used. 
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def test_function(a="me") 

puts "Test: #{a}" 
end 

puts test_f unction () # => Test: me 

puts test_f unctionC'you") # => Test: you 



A fniictirin ran Tptnrn a A'ai'iaMe. (-alleil a rptnrn atiIiip. 
def square (x) 

return x**2 # note the ** is a power-of operator in Ruby 
end 

puts square (3) # => 9 

A block is a collection of statements that can be treated as a single 
unit. A block can be provided to a function and it can be provided with 
parameters. A block can be defined using curly brackets {} or the do and 
end keywords. Parameters to a block are signified by | var | . 

The following examples shows an array with a block passed to the 
constructor of the Array object that accepts a parameter of the current 
array index being initialized and return's the value with which to initialize 
the array. 



b = Array .new(lO) 


{|i 


i} # define a new array 


initialized 


0. .9 


# do . . . end block 










b = Array .new(lO) 


do 


i| # => [0, 1, 4, 9. 16, 


25, 36, 49, 


64, 81] 


i * i 










end 











Everything is an object in ruby, even numbers, and as such everything 
has some behaviors defined. For example, an integer has a .times function 
that can be called that takes a block as a parameter, executing the block 
the integer number of times. 



10. times {|i| puts i} # prints 0..9 each on a new line 



A. 3 Ruby Idioms 

There are standard patterns for performing certain tasks in Ruby, such 
as assignment and enumerating. This section presents the conmion Ruby 
idioms used throughout the code examples in this book. 

A. 3.1 Assignment 

Assignment is the definition of variables (setting a variable to a value) . Ruby 
allows mass assignment, for example, multiple variables can be assigned to 
respective values on a single line. 
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a,b,c = 1,2,3 



Ruby also has special support for arrays, where variables can be mass- 
cissigned from the values in an array. This can be useful if a function returns 
an array of values which are mass assigned to a collection of variables. 

a, b, c = [1, 2, 3] 

def get_inin_max (vector) 

return [vector .min, vector. max] 
end 

v = [1,2,3,4,5] 

min, max = get_min_max(v) # => 1, 5 



A. 3. 2 Enumerating 

Those collections that are enumerable, such as arrays, provide convenient 
functions for visiting each value in the collection. A very common idiom is 
the use of the . each and . each_with_index functions on a collection which 
accepts a block. These functions are typically used with an in-line block {} 
so that they fit onto one line. 



[1,2,3,4,5] .each {|v| 


puts v} # in-line block 


# a do. . . end block 




[1,2,3,4,5]. each_with. 


.index do |v,i| 


puts "#{i} = Hv}" 




end 





The sort function is a very heavily used enumeration function. It returns 
a copy of the collection that is sorted. 



a = [3, 2, 4, 1] 




a = a. sort # => [1 , 2, 


3, 41 



There are a few versions of the sort function including a version that 
takes a block. This version of the sort function can be used to sort the 
variables in the collection using something other than the actual direct 
values in the array. This is heavily used in code examples to sort arrays 
of hash maps by a particular key-value pair. The <=> operator is used to 
compare two values together, returning a -1, 0, or 1 if the first value is 
smaller, the same, or larger than the second. 

a = { : quality=>2 , :quality=>3, :quality=>l} 

a = a. sort -[|x,y| x[ : quality] <=>y[: quality] } # => ordered by quality 
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A. 3. 3 Function Names 

Given that everything is an object, executing a function on a object (a 
behavior) can be thought of as sending a message to that object. For some 
messages sent to objects, there is a convention to adjust the function name 
accordingly. For example, functions that ask a question of an object (return 
a boolean) have a question mark (?) on the end of the function name. Those 
functions that change the internal state of an object (its data) have an 
exclamation mark on the end (!). When working with an imperative script 
(a script without objects) this convention applies to the data provided as 
function arguments. 

def is_rich? (amount) 

return amount >= 1000 
end 

puts is_rich?(99) # => false 

def square_vector ! (vector) 

vector . each_with_index -[|v,i| vector [i] = v**2]- 
end 

V = [2,2] 

square_vector ! (v) 

puts V. inspect # => [4>4^ 



A. 3. 4 Conclusions 

This quick-start guide has only scratched the surface of the Ruby Pro- 
gramming Language. Please refer to one of the referenced text books on 
the language for a more detailed introduction into this powerful and fun 
programming language [1, 2]. 
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